diff --git a/_includes/code.md b/_includes/code.md index 360e03b..e8e23b8 100644 --- a/_includes/code.md +++ b/_includes/code.md @@ -1,20 +1,21 @@ -{% assign figure-number = figure-number | default: 0 | plus: 1 %} -{% assign figure-index = figure-number | minus: 1 %} +{% assign code-number = code-number | default: 0 | plus: 1 %} +{% assign code-index = code-number | minus: 1 %} -{% assign -figure-ref = '[figure ' | append: page.chapter | append: '.' | append: figure-number +{% assign -code-ref = '[code ' | append: page.chapter | append: '.' | append: code-number | append: '](#' | append: include.name | append: ')' %} -{% if figure-reference %} - {% assign figure-reference = ((figure-reference | join: '$') | append: '$' - | append: -figure-ref) | split: '$' %} +{% if code-reference %} + {% assign code-reference = ((code-reference | join: '$') | append: '$' + | append: -code-ref) | split: '$' %} {% else %} - {% assign figure-reference = -figure-ref | split: '$' %} + {% assign code-reference = -code-ref | split: '$' %} {% endif %} ```cpp -{% include_relative code/{{include.name}}.cpp %}``` +{% include_relative code/{{include.name}}.cpp %} +```

-Code {{page.chapter}}.{{figure-number}}: {{include.caption}} +Code {{page.chapter}}.{{code-number}}: {{include.caption}}

{: #{{include.name}} } diff --git a/_includes/figure.md b/_includes/figure.md index 6422a29..6f2105a 100644 --- a/_includes/figure.md +++ b/_includes/figure.md @@ -15,7 +15,7 @@ ![figure {{page.chapter}}.{{figure-number}}: {{include.caption}}][{{include.name}}]\\ -figure {{page.chapter}}.{{figure-number}}: {{include.caption}} +Figure {{page.chapter}}.{{figure-number}}: {{include.caption}}

{: #{{include.name}} } [{{include.name}}]: figures/{{include.name}}.svg diff --git a/_layouts/book-page.html b/_layouts/book-page.html index 97e2f76..811425b 100644 --- a/_layouts/book-page.html +++ b/_layouts/book-page.html @@ -1,6 +1,8 @@ --- layout: default --- +{% include mathjax.html %} +
diff --git a/better-code/00-preface.md b/better-code/00-preface.md index e89a51b..0127e70 100644 --- a/better-code/00-preface.md +++ b/better-code/00-preface.md @@ -2,6 +2,7 @@ title: Preface layout: book-page tags: [ better-code ] +chapter: 0 --- To understand what _better code_ is, we first need to understand what _good code_ is. Students are taught that good code is code that does what the specification says it should. But such an answer begs the question of what is a good specification? Nearly every experienced developer I've met has a snippet of code filed away that has profound beauty - it likely has no corresponding specification and may not even contain a single comment. So what is good code? diff --git a/better-code/01-types.md b/better-code/01-types.md index f254a6e..66f97de 100644 --- a/better-code/01-types.md +++ b/better-code/01-types.md @@ -1,7 +1,6 @@ --- title: Types tagline: No Incomplete Types - layout: book-page tags: [ better-code ] chapter: 1 diff --git a/better-code/02-algorithms.md b/better-code/02-algorithms.md index fa32cf7..e2e0450 100644 --- a/better-code/02-algorithms.md +++ b/better-code/02-algorithms.md @@ -1,9 +1,9 @@ --- title: Algorithms tagline: No Raw Loops - -layout: page +layout: book-page tags: [ better-code ] +chapter: 2 --- Testing 1.2.3... diff --git a/better-code/03-data-structures.md b/better-code/03-data-structures.md index e1d269b..55b4893 100644 --- a/better-code/03-data-structures.md +++ b/better-code/03-data-structures.md @@ -1,7 +1,6 @@ --- title: Data Structures tagline: No Incidental Data Structures - layout: book-page tags: [ better-code ] chapter: 3 @@ -23,7 +22,7 @@ As we saw in [chapter 1](01-types.html) a type is a pattern for storing and modi Values are related to other values, for example, 3 is not equal to 4. -If two objects of the same type have the same representation then they represent the same value. Representational equality implies value equality. If the representation is unique then the converse is also true. A hash is a regular function on a representation or a value. Because it is regular, if two values are equal then the hash of the values are also equal. +If two objects of the same type have the same representation then they represent the same value. Representational equality implies value equality. If the representation is unique then the converse is also true. A hash is a regular function on a representation or a value. Because it is regular, if two values are equal then the hash of the values is also equal. Because objects exist in memory, they have a _physical_ relationship. The value at the first location in an array is located before the value in the second location. If we sort the values, we establish a correspondence between the physical and value relationships, i.e. an element before another element is less than or equal to that element. We can represent locations as values (pointers) and use those to represent additional relationships, such as "is a child of". @@ -37,15 +36,15 @@ The choice of encoding can make a dramatic difference on the performance of oper Although data structures tend to be thought of simply in terms of containers such as arrays, lists, or maps, anytime a relationship is established between objects a data structure is created. However, to avoid confusion we will reserve the term _data structure_ to refer to types with a set of invariants which insure a set of relationships are maintained. More transient data structures will be referred to as _structured data_. -As an example of utilizing structured data_, consider the problem of finding the `nth` to `mth` elements of an array as if the array was in sorted order. The trivial way to do this is to simply sort the entire array and then print the `nth` to `mth` elements. In this example `[sf, sl)` is a subrange of `[f, l)`. {::comment}appendix to describe half open notation?{:/comment} +As an example of utilizing structured data_, consider the problem of finding the `nth` to `mth` elements of an array as if the array was in sorted order. The trivial way to do this is to simply sort the entire array and then print the `nth` to `mth` elements. In this example `[sf, sl)` is a subrange of `[f, l)`. {::comment}appendix to describe half-open notation?{:/comment} {% include code.md name='sort-subrange-0' caption='inefficient sort subrange' %} {::comment} Should this section start with partial_sort then add nth_element instead of the other way around? {:/comment} -This function, however, does more work than is necessary. There is a function in the standard library, `nth_element()` which given a position `nth` within a range `[f, l)` has the post condition that the element at `nth` is the same element that would be in that position if `[f, l)` were sorted. `nth_element()` is a special case of sort_subrange when the subrange is of length 1 (or 0 if `nth == l`). +This function, however, does more work than is necessary. There is a function in the standard library, `nth_element()` which given a position `nth` within a range `[f, l)` has the post-condition that the element at `nth` is the same element that would be in that position if `[f, l)` were sorted. `nth_element()` is a special case of sort_subrange when the subrange is of length 1 (or 0 if `nth == l`). -This would not be of much use to build `sort_subrang()` except that `nth_element()` has an additional post condition. The range `[f, l)` is partitioned such that all elements prior to `nth` are less than or equal to the final element at `nth`. This post condition leaves us with _structured data_ and we can take advantage of that structure. If we find the `nth_element()` where `nth` is `sf` then we only need to sort the remaining elements to `sl` which can be done with `partial_sort()`. +This would not be of much use to build `sort_subrang()` except that `nth_element()` has an additional post-condition. The range `[f, l)` is partitioned such that all elements prior to `nth` are less than or equal to the final element at `nth`. This post-condition leaves us with _structured data_ and we can take advantage of that structure. If we find the `nth_element()` where `nth` is `sf` then we only need to sort the remaining elements to `sl` which can be done with `partial_sort()`. {% include code.md name='sort-subrange-1' caption='improved sort subrange' %} diff --git a/better-code/04-runtime-polymorphism.md b/better-code/04-runtime-polymorphism.md index 24eccd4..1398a95 100644 --- a/better-code/04-runtime-polymorphism.md +++ b/better-code/04-runtime-polymorphism.md @@ -1,9 +1,9 @@ --- title: Runtime Polymorphism tagline: No Public Inheritance - -layout: page +layout: book-page tags: [ better-code ] +chapter: 4 --- Object-oriented programming has been one of the paradigms supported by C++ from its invention. The idea of type inheritance and virtual functions were borrowed from Simula[^cpp-history]. Inheritance can represent a subtype or protocol relationship. Although the two are closely related, in this chapter we're primarily concerned with subtype relationships through class inheritance. Protocols are discussed in the next chapter. {::comment}link{:/comment} diff --git a/better-code/05-concurrency.md b/better-code/05-concurrency.md index b3f3795..14d21dc 100644 --- a/better-code/05-concurrency.md +++ b/better-code/05-concurrency.md @@ -1,61 +1,295 @@ --- title: Concurrency tagline: No Raw Synchronization Primitives - +mathjax: true layout: book-page tags: [ better-code ] +chapter: 5 --- ### Motivation +A _task_ is a unit of work, often a function. + +_Concurrency_ is when multiple tasks start, run, and complete in overlapping time-periods and should not be confused with _parallelism_ which is when multiple tasks execute simultaneously. Parallelism requires some form of hardware support, whereas concurrency can be achieved strictly through software, such as a cooperative tasking system. + +There are two primary benefits of concurrent code. The performance is the first by enabling parallelism. The second is to improve interactivity by not blocking the user while a prior action is being processed. + +As clock rates on systems have stagnated, hardware developers have turned to parallelism to increase performance. Figure [xxx] shows the performance distribution on a typical desktop system. A single-threaded, non-vectorized, application can only utilize about 0.25% of the performance capabilities of the machine. + +The goal of this chapter is to develop concurrent code without using raw synchronization primitives. + +### Definition of _raw synchronization primitives_ + +A _raw synchronization primitive_ is a low-level construct used to synchronize access to data. Examples include locks and [^mutex]es, condition variables, semaphores, atomic operations, and memory fences. + +{::comment} Discuss the difference between data parallelism and task concurrency, so far this chapter is only dealing with tasking. However, it could be expanded upon. {:/comment} + +### Problems of _raw synchronization primitives_ + +The first problem with raw synchronization primitives is that they are exceedingly error-prone to use because, by definition, they require reasoning about non-local effects. + +For example, the following is a snippet from a copy-on-write[^cow_definition] data type, this is a simplified version of code from a shipping system. + +{% include code.md name='05-bad_cow' caption='Incorrect copy-on-write' %} + +The code contains a subtle race condition. The `if` statement at line 16 is checking the value of an atomic count to see if it is one. The `else` statement handles the case where it is not one. At line 19, the count is decremented within the `else` statement. The problem is that if decrementing the count results in a value of zero then the object stored in `_object` should be deleted. The code fails to check for this case, and so an object may be leaked. + +The initial test isn’t sufficient to see if the count was one. Between that check and when the count is decremented and another thread may have released ownership and decremented the count leaving this object instance as the sole owner. + +The correct way is to test atomically with the decrement in the same statement, line 19. The code is shown below: + +{% include code.md name='05-correct_cow' caption='Correct copy-on-write' %} + +The code of the complete, correct implementations is available online[^cow]. + +Another problem with raw synchronization primitives is that their use can have a large negative impact on the system performance. This implications are described by Amdahl’s Law. + +The intuition behind Amdahl's Law is that if a part of the system takes time x to complete on a single core or processor, then it will encounter a speedup of y if it is run on y cores, but only if no synchronization takes places between the different cores or processors. +$$S(N) = \frac{1}{(1-P)+\frac{P}{N}}$$ +Where the speedup $$S$$ is defined by this equation. $$P$$ is hereby the amount of synchronization in the range of $$[0 .. 1]$$ and $$N$$ is the number of cores or processors. + +Drawing the abscissa in logarithmic scale illustrates that there is only a speedup of 20 times, even when the system is running on 2048 or more cores and just 5% synchronization takes place. + +{% include figure.md name='05-amdahl_log' caption="Amdahl's law logarithmic scale" %} + +Since most desktop or mobile processors have nowadays less than 64 cores, it is better to take a look at the graph with a linear scale. Each line below the diagonal represents 10% more serialization. So if the application just has 10% of serialization and it is running on 16 cores then the speed-up is just a little better than factor of six. + +{% include figure.md name='05-amdahl_lin' caption="Amdahl's law linear scale" %} + +So Amdahl's law has a huge impact. Serialization doesn't mean only locking on a mutex. Serialization can mean sharing the same memory or sharing the same address bus for the memory, if it is not a NUMA architecture. Sharing the same cache line or anything that is shared within the processor starts to bend that curve down. Even a write operation on an atomic value is synchronized between the cores and bends that curve down, it bends it down rapidly, + {::comment} -For this section I need to first provide motivation for concurrency, and define concurrency and parallelism. which are not commonly understood. Do I need to provide a motivation section for each chapter? +Also, in the Amdahl's law section I think we should have a passing reference to Gustafson's law which is related to Amdahl's law but is looking at latency of the system as the number of processors increases instead of time to complete a fixed body of work. Gustafson's law is applicable for building interactive systems and scalable server architectures as examples where scalability implies the system will be processing more requests. {:/comment} -_Concurrency_ is when multiple tasks start, run, and complete in overlapping time periods and should not be confused with _parallelism_ which is when multiple tasks execute simultaneously. Parallelism requires some form of hardware support, where as concurrency can be achieved strictly through software, such as a cooperative tasking system. +The following illustrates an often used model for implementing exclusive access to an object by multiple threads: -There are two primary benefits for concurrent code. The first is performance by enabling parallelism. The second is to improve interactivity by not blocking the user while a prior action is being processed. +{% include figure.md name='05-traditional_locking-1' caption="Different threads need access to single object" %} -As clock rates on systems have stagnated, hardware developers have turned to parallelism to increase performance. Figure [xxx] shows the performance distribution on a typical desktop system. A single threaded, non-vectorized, application can only utilize about 0.25% of the performance capabilities of the machine. +As soon as the different threads do not only want to read the single object, but need write access as well, it is necessary to give just a single thread exclusive access. (Otherwise, undefined behavior is the result.) All other threads have to stop for their turn to get read or write access. -### Definition of _raw synchronization primitives_. +{% include figure.md name='05-traditional_locking-2' caption="Exclusive access with locking" %} -A _raw synchronization primitive_ is a low level construct used to synchronize access to data. Examples include locks and mutexes, condition variables, semaphores, atomic operations, and memory fences. +When the one thread does not need anymore its exclusive access, it gives its up. -{::comment} Discuss difference between data parallelism and task concurrency, so far this chapter is only dealing with tasking. However, it could be expanded upon. {:/comment} +{% include figure.md name='05-traditional_locking-3' caption="Exclusive access by different threads" %} -The goal of this chapter is to develop concurrent code without using raw synchronization primitives. +And the next thread can get the exclusive access. + +This is a horrible way to think about threading. The goal has to be to minimize waiting at all costs. David [^Butenhof], one of the POSIX implementors, coined the phrase that mutex should be named better bottleneck, because of the property of slowing down an application. -The first problem with raw synchronization primitives are that they are exceedingly error prone to use because, by definition, they require reasoning about non-local effects. +In the following, let's take a look at a traditional piece of code: -For example, [xxxx] is a snippet from a copy-on-write datatype, this is a simplified version of code from a shipping system. +{% include code.md name='05-registry-0' caption='Registry example' %} + +It is a registry class with a shared `set` and `get` function. The access to the underlying unordered map is protected against concurrent access with a mutex. At the first glance, it seems that only minimal work is done under the mutex. The unordered map is a fairly efficient data structure, it is a hash map. But the amount of time it takes to hash the key depends on the length of the string. So the work that is being done under the lock here is fairly unbounded. It depends completely on the lengths of the string. It may be probably typically small but it could be big. On top of the hash calculation comes a potential allocation of a new bucket within the unordered map, which in most cases requires another lock within the memory manager. This lock can be, depending on the operating system, a global lock within the process. + +For a better understanding of the purpose of using locks, it is necessary to take a step back. The C++ standard states here: _It can be shown that programs that correctly use mutexes and memory\_order\_seq\_cst operations to prevent all data races and use no other synchronization operations behave as if the operations executed by their constituent threads were simply interleaved, with each value computation of an object being taken from the last side effect on that object in that interleaving. This is normally referred to as ‘sequential consistency.’_, C++11 Standard 1.10.21. + +So why is this an important sentence? It means that one can always think about mutexes as if one has some set of interleaved operations. + +{% include figure.md name='05-sequential_operations' caption='Sequential operations' %} + +* A mutex serializes a set of operations, $$Op_n$$, where the operation is the code executed while the mutex is locked +* Operations are interleaved and may be executed in any order and may be repeated +* Each operation takes an argument, $$x$$, which is the set of all objects mutated under all operations + * $$x$$ may not be safely read or written without holding the lock if it may be modified by a task holding the lock +* Each operation may yield a result, $$r_m$$, which can communicate information about the state of $$x$$ while it’s associated operation was executed +* The same is true of all atomic operations + +So there is not a lot of difference between a `std::atomic`. In fact, there is a call on `std::atomic` that returns `true`, if it is lock-free. This means the processor supports to do that as an atomic item within the processor, or is there, not processor support and the compiler has to generate a mutex pair to lock, make the change on the atomic operation, and do the unlock. So that mutexes and locks are the way to construct atomic operations. + +That means that any piece of code that has a mutex can be transformed into a queued model. This idea applied to the registry example from above leads to this: + +{% include code.md name='05-registry-1' caption='Registry with queue' %} + +Given that there is a serial queue `_q` with an `async` function which executes the passed item and it uses the same calling conventions as `std::async`. Hereby with the difference that it guarantees that the next item being processed in that queue doesn't start until the previous one is completed. Then one can rewrite the `set` string function to be executed with `_q.async`. +As well one can rewrite the `get` string operation. But here the difference is, that one needs the result back out, paired with that particular `get`. It is realized here with a future. (Futures will be covered later in more detail.) So the result of the `get` function, e.g. with a continuation, can be used whenever the `key` is hashed and the lookup in the hash is completed. + +{% include code.md name='05-registry-2' caption='Enhanced registry with queue' %} + +Why is it important to understand this concept? Because at any place with a mutex in the code one can always make this transformation. One can always transform it into a serialized queue model. And this means that within the serialized queue model anytime somebody can come along and calls `set`, regardless of the amount of work that `set` takes, the time it takes for `set` to return to the caller itself constant. This means as well that one can add something like an arbitrary `set`, e.g a whole vector of key-value pairs. And to the caller, this `set` will take just as much time as the previous `set`. It's a non-blocking operation with an upper bound of overhead. + +{::comment} +Add here the two measurement graphs that compare the mutex guarded and serial queue guarded map without and with additional load +{:/comment} + +### Further problems of mutexes + +The usage of mutexes raise the probability of dead-locks within a complex system. Function calls under a locked mutex should be avoided. + +This means as well that user objects should not be destructed under a locked mutex. + +{% include code.md name='05-destruction-0' caption='Destruction of container node under mutex' %} + +Depending of the - in this example omitted - complexity of class `Field`, the destruction of the object under the mutex increases the contention under the lock. + +{% include code.md name='05-destruction-1' caption='Destruction of list node outside the mutex' %} + +Only node pointers are transfered within the `list.splice` operation to the temporary element `obsolete_field`, so that the node can be deleted outside the mutex. + +{% include code.md name='05-destruction-2' caption='Destruction of unordered_map node outside the mutex' %} + +`unordered_map.find` returns the iterator of the searched noded, if it is available. `unordered_map.extract` moves the node out of the container into the temporary object `obsolete_node`. The possible content of the node is deleted - outside of the mutex - when it is destructed by leaving the outer scope. + +{% include code.md name='05-destruction-3' caption='Destruction of vector objects outside the mutex' %} + +The situation within a container is different because it is not node based. When the complexity of `vector::value_type` is beyond trivial destructable one can move the obsolete objects into a temporary vector and delete it outside of the mutex. {::comment} -Insert bad cow example here. Can this example be simplified even more? Remove the template and make it a string? +More problems??? {:/comment} -The highlighted lines {::comment} how? {:/comment} contain a subtle race condition. The `if` statement is checking the value of an atomic count to see if it is `1`. The `else` statement handles the case where it is not 1. Within the else statement the count is decremented. The problem is that if decrementing the count results in a value of `0` then the object stored in `object_m` should be deleted. The code fails to check for this case, and so an object may be leaked. +### Problems of _raw threads_ + +A _thread_ is an execution environment consisting of a stack and processor state running in parallel to other threads. +A _task_ is a unit of work, often a function, to be executed on a thread. + +Another common scenario is that increased work within an application is outsourced to a spawned background thread with the intent that the available CPU cores are better utilized. + +{% include figure.md name='05-background_thread' caption='Background thread executing tasks' %} -The initial test to see if the count was `1` isn't sufficient, between that check and when the count is decremented another thread may have released ownership and decremented the count leaving this object instance as the sole owner. +Since this is recognized as a successful idiom to solve performance problems of an application, it becomes easy the default way to solve such issues. -The fix is to test atomically with the decrement, the correct code is shown in [xxxx]. +{% include code.md name='05-background_worker' caption='Simple background worker' %} -Another problem with raw synchronization primitives is that their use can have a large negative impact on system performance. To understand why, we need to understand Amdahl's Law. +Over time the application gets enhanced with more modules and plugins. When now for each of these the same idea was applied then the complete application uses a huge number of threads. +An over-subscription of threads is then easily the case. That means that more threads are used than CPU cores are available. So the kernel of the operating system has to constantly switch the threads between the available cores to prevent starvation of single threads. +Within such a switch - named context switch - the CPU registers, program counter, and stack pointer of the old thread are saved and the ones from the new thread need to be restored. This saving and restoring take time that is lost for computational tasks of an application. Besides this, the translation lookaside buffer (TLB) must be flushed and the page table of the next process is loaded. The flushing of the TLB causes that the memory access of the new thread is slower in the beginning. This causes an additional slow down. +So the goal has to be that the number of context switches is as low as possible. -The intuition behind Amdahl's Law is that if part of system takes time x to complete, +One way to archive this goal is to use a task system. A task system uses a set of threads, normally equal to the number of CPU-cores, and distributes the submitted tasks over the available threads. In case that more tasks are submitted than free threads are available then they are put into a queue and whenever one is done the next task is taken from the queue and executed. + +{% include figure.md name='05-simple_tasking_system' caption='Simple task system' %} + +Since the number of threads is constant, ideally there is no need to perform any context switches. (Because of simplification reasons a fact is here ignored that other system services have running threads as well so there are happening context switches in any case.) A task system within an application is an appropriate measure to reduce the number of context switches as long as all modules within it use the same instance of the task system. + +To illustrate the purpose and gaining a better understanding of the implications within such a task system, its code is developed in the following. + +The [figure](#05-simple_tasking_system) above shows that the task system consist out of a notification queue: + +{% include code.md name='05-notification_queue-1' caption='Notification queue class' %} + +This notification queue consists of a `deque` of `task` with a `mutex` and a `condition_variable`. It has a `pop()` operation which is just going to pull one item off of the queue. And it has a `push()` operation to push one item into the queue and notify anybody who might be waiting on the queue. + +{% include code.md name='05-task_system-1' caption='Task system class' %} + +The task system has a `_count` member which is set to the number of available cores. The system has a vector of threads and the notification queue. The `run()` function is the function that will be executed by the threads. Inside that function is an empty `task` object. As soon as an item is available in the queue, it pops it from the queue and executes it, and tries to pick the next one. +The constructor of the task system spins up as many threads as there are cores. Each thread is bound with a lambda against the `run()` function. +When the task system gets destructed, it is necessary to join all threads. The function that is used by the outside is `async`. It takes the `task` object and pushes it into the queue. +This system is so far very primitive, e.g. it would hang on destruction. The latter is corrected by the following additions: + +{% include code.md name='05-notification_queue-2' caption='Notification queue with done switch' %} + +So with the new `done()` function the new member `_done` is set and the queue is notified about the change. In case the code is waiting in the `pop()` function, it is woken up from the condition variable and it is checked if `_done` is set and then returns `false`. + +{% include code.md name='05-task_system-2' caption='Non-blocking task system on destruction' %} + +The task system notifies within the destructor all queues to ignore all potentially remaining entries which allow that the threads can be joined without delay. (With C++20 this could be enhanced with `jthread`s.) + +This task system performs very badly compared to MacOS' Grand Central Dispatch (GCD) for instance. It just has a throughput of about 6%. Why does this system perform so badly even that this is the recommended design at several places? +This design follows principle from the above [figure](#05-traditional_locking-2). It has a single queue and a bunch of threads. These are banging on that queue and so the threads are waiting often on the mutex. + +Unfortunately, it is not possible to transform this model as it is described above because all that is there is a queue. So a different approach is needed. + +{% include figure.md name='05-task_system_multiple_queues' caption='Task system with multiple notification queues' %} + +A way to reduce the contention on this single queue is to change the task system in a way that each thread has its individual queue. + +{% include code.md name='05-task_system-3' caption='Task system with multiple queues' %} + +The task system has now as many queues as threads and the `run()` function gets the parameter index for its corresponding queue so that it can pick its belonging items. +On destruction, the task system has now to notify all queues about to end their work. +A continuously incremented atomic integer modulo the number of queues is used within the `async()` function to distribute the tasks in a round-robin manner over the queues. An atomic member is used to enable that this function can be used from multiple threads. + +This system now performs about as twice as fast compared to the previous approach. But this new way has still two problems: A long-running task will block the execution of all other tasks which are behind this one in the queue even that the queues of other cores went dry. And on a fully loaded system, there is a fair amount of contention on the mutex of a queue. + +These problems can be minimized by using the mechanism of task stealing. + +{% include figure.md name='05-task_system_task_stealing' caption='Task System with Task Stealing' %} + +There are different sophisticated, highly optimized approaches in implementing task stealing. Here is take a very simple strategy. + +{% include code.md name='05-notification_queue-3' caption='Notification queue with try_pop and try_push' %} + +The queue is enhanced by two new functions, `try_pop()` and `try_push()`. +The `try_pop()` function return `false` when the attempt fails to get the lock on the mutex with the additional `try_to_lock` property. This can be the case when either another thread currently pushes or pops an item into or from the queue. The other possibility of a `false` result is when the queue is empty. +Similar applies for the `try_push()` function. +The important difference is that a thread using one of these two functions does never stop on a blocked mutex! + +{% include code.md name='05-task_system-4' caption='Task system with task stealing' %} + +The code tries within the `run()` function to pop an item from its corresponding queue by calling `try_pop()`. This can either fail because that queue is currently busy or empty. In both cases, the code tries to steal a task from a different thread until it has checked for all other queues. If there are no tasks to execute, then it calls for a blocking `pop()` and it is woken up whenever there is more work to do. +The same approach is taken for pushing an item into the queue in the `async()` function with the difference that the code spins some time over every queue until it finds one to push the task to. The spinning is done to lower the probability that the calling thread is getting stuck on the later finally implemented `push()` call. + +The task system reaches now about 85% of the performance of the reference implementation. + +So the first goal, reducing the number of arbitrary threads is fulfilled; the number of context switches can be minimized by using a task system. But as soon as every single application on a machine uses its instance of a task system, there is again the problem of over-subscription because each instance would start as many threads as there are cores. +Such a task system with a fixed number of threads has another problem. The risk is there of dead-locks. + +{% include figure.md name='05-dead_lock' caption='Dead lock within queued tasks' %} + +As soon as a task `a` creates a new task `b` and the progress of `a` depends on the result of task `b` and task `b` got stuck in the queue behind `a` then the system is in a dead-lock. Figure [](#05-dead_lock) illustrates the problem just with a single queue. But the same problem is there with multiple queues and depending task get stuck behind other tasks that are blocked because they are waiting for getting a lock on a mutex or waiting for another result. + +So the only solution to reduce the problem of having an unbound number of threads and the probability of dead-locks because of depending tasks is that all applications within a system use the same task system. Only a task system on OS's kernel-level knows about threads that currently don't make progress and can spawn new threads to prevent a dead-lock situation. +MacOS and Windows e.g. provide here out of the box a task system through a low-level API. (Mac's task-system libdispatch can be added to Linux via package managements.) + +Regarding the previous implementation with the serial-queue and the task-system, it is important to keep in mind that lock-free implementations of queues exist which can be utilized to improve the performance. Lock-free does not mean that no synchronizations take place, but the overhead is reduced. +While submitting tasks, the size of the task should be weighted against the overhead of the used serial-queue or task-system. If the tasks are too small compared to the overhead then it is more efficient to execute them serially. + +#### Futures as abstraction + +Conceptually, a future is a way to separate the result of a function from the execution of the function. The task (the function packaged so it returns void) can be executed in a different context (the execution context is controlled by executors in some of the proposals) and the result will become available via the future. + +A future also serves as a handle to the associated task, and may provide some operation to control the task. + +The primary advantage of a future over a callback is that a callback requires the subsequent operation in advance. Where a future allows a continuation, via a `then()` function, at some later point. This feature makes futures easier to compose, easier to integrate into an existing system, and more powerful as they can be stored and the continuation can be attached as the result of another action, later. However, this flexibility comes with the inherent cost, it requires an atomic test when the continuation is attached to determine if the value is already available. Because of this cost, for many library operations, it makes sense to provide a form taking a callback as well as one returning a future. Although at first glance it may appear, that a callback from is easily adapted to a future form, that is not the case for reasons discussed below. {::comment} -Math experiment for Fibonacci matrix. +Shall call backs be discussed here? Technically they don't introduce a problem. But from the point of view of maintainability it is one because the control flow is hard to understand. {:/comment} -$$ -\begin{align*} - \left[ \begin{array}{cc} - 1 & 1 \\ - 1 & 0 - \end{array} \right]^{n} = - \left[ \begin{array}{cc} - F_{n+1} & F_n \\ - F_n & F_{n-1} - \end{array} \right] -\end{align*} -$$ + +### Problems of call backs +* Contra: Hard dto reason about the flow in the code +* Contra: Callback must be set beforehand, futures can be attached later +* Pro: Can be faster than futures because no overhead of shared state counter + + +#### Futures as abstraction + +Futures +Conceptually, a future is a way to separate the result of a function from the execution of the function. The task (the function packaged so it returns void) can be executed in a different context (the execution context is controlled by executors in some of the proposals) and the result will become available via the future. + +A future also serves as a handle to the associated task, and may provide some operation to control the task. + +The primary advantage of a future over a simple callback is that a callback requires you to provide the subsequent operation in advance. Where a future allows you to provide a continuation, via a then() member function, at some later point. This feature makes futures easier to compose, easier to integrate into an existing system, and more powerful as they can be stored and the continuation can be attached as the result of another action, later. However, this flexibility comes with the inherent cost, it requires an atomic test when the continuation is attached to determine if the value is already available. Because of this cost, for many library operations, it makes sense to provide a form taking a callback as well as one returning a future. Although at first glance it may appear, that a callback from is easily adapted to a future form, that is not the case for reasons discussed below. + +#### Channels (or actors) as abstraction + + + + +### Motivation + +1st example export from ui with compression and possibility to cancel +2nd example export group of images with compression and possibility to cancel + +### Develop Solution + +1st solution Use futures +2nd solution Use channels + +### Conclusion + +[^mutex]: + mutual exclusion + +[^cow_definition]: + Copy-on-write [https://en.wikipedia.org/wiki/Copy-on-write](https://en.wikipedia.org/wiki/Copy-on-write) + +[^cow]: + Copy-on-write implementation in stlab. [https://github.com/stlab/libraries/blob/develop/stlab/copy_on_write.hpp](https://github.com/stlab/libraries/blob/develop/stlab/copy_on_write.hpp) + +[^butenhof]: + Recursive mutexes by David Butenhof [http://zaval.org/resources/library/butenhof1.html](Recursive mutexes by David Butenhof) diff --git a/better-code/06-relationships.md b/better-code/06-relationships.md index ba9b147..3ec6949 100644 --- a/better-code/06-relationships.md +++ b/better-code/06-relationships.md @@ -1,9 +1,9 @@ --- title: Relationships tagline: No Contradictions - layout: book-page tags: [ better-code ] +chapter: 6 --- ### Motivation @@ -18,13 +18,13 @@ All of the goals and lessons in this book have been, in one way or another, abou ### Definition of _raw synchronization primitives_. -A _raw synchronization primitive_ is a low level construct used to synchronize access to data. Examples include locks and mutexes, condition variables, semaphores, atomic operations, and memory fences. +A _raw synchronization primitive_ is a low-level construct used to synchronize access to data. Examples include locks and mutexes, condition variables, semaphores, atomic operations, and memory fences. -{::comment} Discuss difference between data parallelism and task concurrency, so far this chapter is only dealing with tasking. However, it could be expanded upon. {:/comment} +{::comment} Discuss the difference between data parallelism and task concurrency, so far this chapter is only dealing with tasking. However, it could be expanded upon. {:/comment} The goal of this chapter is to develop concurrent code without using raw synchronization primitives. -The first problem with raw synchronization primitives are that they are exceedingly error prone to use because, by defintion, they require reasoning about non-local effects. +The first problem with raw synchronization primitives are that they are exceedingly error-prone to use because, by definition, they require reasoning about non-local effects. For example, [xxxx] is a snippet from a copy-on-write datatype, this is a simplified version of code from a shipping system. diff --git a/better-code/07-epilogue.md b/better-code/07-epilogue.md index 5eb14b3..3838cf9 100644 --- a/better-code/07-epilogue.md +++ b/better-code/07-epilogue.md @@ -1,8 +1,8 @@ --- title: Epilogue - -layout: page +layout: book-page tags: [ better-code ] +chapter: 7 --- Testing 1,2,3... diff --git a/better-code/code/05-background_worker.cpp b/better-code/code/05-background_worker.cpp new file mode 100644 index 0000000..636e93a --- /dev/null +++ b/better-code/code/05-background_worker.cpp @@ -0,0 +1,46 @@ +using lock_t = unique_lock; + +class background_worker { + thread _thread; + queue> _queue; + mutex _mutex; + condition_variable _ready; + bool _done; + + public: + background_worker() { + auto worker = [this] { + while (true) { + function task; + { + lock_t lock{_mutex}; + while (!_done && _queue.empty()) _ready.wait(lock); + if (_done) return; + if (!_queue.empty()) { + task = move(_queue.front()); + _queue.pop(); + } + } + if (task) task(); + } + }; + _thread = thread(worker); + } + + ~background_worker() { + { + lock_t lock{_mutex}; + _done = true; + } + _ready.notify_one(); + _thread.join(); + } + + void submit_task(function task) { + { + lock_t lock{_mutex}; + _queue.emplace(move(task)); + } + _ready.notify_one(); + } +}; \ No newline at end of file diff --git a/better-code/code/05-bad_cow.cpp b/better-code/code/05-bad_cow.cpp new file mode 100644 index 0000000..347fe9b --- /dev/null +++ b/better-code/code/05-bad_cow.cpp @@ -0,0 +1,24 @@ +template +class bad_cow { + struct object_t { + explicit object_t(const T& x) : _data(x) {} + atomic _count{1}; + T _data; + }; + object_t* _object; + + public: + explicit bad_cow(const T& x) : _object(new object_t(x)) { } + ~bad_cow() { if (0 == --_object->_count) delete _object; } + bad_cow(const bad_cow& x) : _object(x._object) { ++_object->_count; } + + bad_cow& operator=(const T& x) { + if (_object->_count == 1) _object->_data = x; + else { + object_t* tmp = new object_t(x); + --_object->_count; + _object = tmp; + } + return *this; + } +}; \ No newline at end of file diff --git a/better-code/code/05-correct_cow.cpp b/better-code/code/05-correct_cow.cpp new file mode 100644 index 0000000..2a0bf58 --- /dev/null +++ b/better-code/code/05-correct_cow.cpp @@ -0,0 +1,27 @@ +template +class correct_cow { + struct object_t { + explicit object_t(const T& x) : _data(x) {} + atomic _count{1}; + T _data; + }; + object_t* _object; + + public: + explicit correct_cow(const T& x) : _object(new object_t(x)) {} + ~correct_cow() { + if (0 == --_object->_count) delete _object; + } + correct_cow(const correct_cow& x) : _object(x._object) { ++_object->_count; } + + correct_cow& operator=(const T& x) { + if (_object->_count == 1) + _object->_data = x; + else { + object_t* tmp = new object_t(x); + if (0 == --_object->_count) delete _object; + _object = tmp; + } + return *this; + } +}; \ No newline at end of file diff --git a/better-code/code/05-destruction-0.cpp b/better-code/code/05-destruction-0.cpp new file mode 100644 index 0000000..fb19a4f --- /dev/null +++ b/better-code/code/05-destruction-0.cpp @@ -0,0 +1,18 @@ +class Field { + int _property; +public: + int property() const { return _property; } +}; + +list _fields; +mutex _fields_mutex; + +{ + unique_lock guard{_fields_mutex}; + auto it = find_if(_fields.begin(), _fields.end(), + [item_to_remove](auto const& field) { return field.property() == item_to_remove; } ); + + if (it != _fields.end()) { + _fields.erase(it); + } +} diff --git a/better-code/code/05-destruction-1.cpp b/better-code/code/05-destruction-1.cpp new file mode 100644 index 0000000..617c5f2 --- /dev/null +++ b/better-code/code/05-destruction-1.cpp @@ -0,0 +1,21 @@ +class Field { + int _property; +public: + int property() const { return _property; } +}; + +list _fields; +mutex _fields_mutex; + +list obsolete_field; +{ + unique_lock guard{_fields_mutex}; + auto it = find_if(_fields.begin(), _fields.end(), + [item_to_remove](auto const& field) { return field.property() == item_to_remove; } ); + + if (it != _fields.end()) { + obsolete_field.splice(obsolete_field.end(), _fields, it); + } +} + +obsolete_field.clear(); \ No newline at end of file diff --git a/better-code/code/05-destruction-2.cpp b/better-code/code/05-destruction-2.cpp new file mode 100644 index 0000000..435fd28 --- /dev/null +++ b/better-code/code/05-destruction-2.cpp @@ -0,0 +1,16 @@ +class Field { + int _property; +public: + int property() const { return _property; } +}; + +unordered_map _fields; +mutex _fields_mutex; + +{ + unordered_map::note_type obsolete_node; + { + unique_lock guard{_fields_mutex}; + obsolete_node = _fields.extract(_fields.find(key_to_remove)); + } +} \ No newline at end of file diff --git a/better-code/code/05-destruction-3.cpp b/better-code/code/05-destruction-3.cpp new file mode 100644 index 0000000..22f26c4 --- /dev/null +++ b/better-code/code/05-destruction-3.cpp @@ -0,0 +1,19 @@ +class Field { + int _property; +public: + int property() const { return _property; } +}; + +vector _fields; +mutex _fields_mutex; + +vector obsolete_fields; +{ + unique_lock guard{_fields_mutex}; + auto it = remove_if(_fields.begin(), _fields.end(), + [items_to_remove](auto const& field) { return field.property() == items_to_remove; } ); + + obsolete_fields.resize(distance(it, _fields.end())); + std::move(it, _fields.end(), obsolete_fields.begin()); +} +obsolete_fields.resize(0); \ No newline at end of file diff --git a/better-code/code/05-notification_queue-1.cpp b/better-code/code/05-notification_queue-1.cpp new file mode 100644 index 0000000..d01b97e --- /dev/null +++ b/better-code/code/05-notification_queue-1.cpp @@ -0,0 +1,26 @@ +using lock_t = unique_lock; + +class notification_queue { + deque> _q; + mutex _mutex; + condition_variable _ready; + + public: + void pop(function& x) { + lock_t lock{_mutex}; + + while (_q.empty()) _ready.wait(lock); + + x = move(_q.front()); + _q.pop_front(); + } + + template + void push(F&& f) { + { + lock_t lock{_mutex}; + _q.emplace_back(forward(f)); + } + _ready.notify_one(); + } +}; \ No newline at end of file diff --git a/better-code/code/05-notification_queue-2.cpp b/better-code/code/05-notification_queue-2.cpp new file mode 100644 index 0000000..fb29089 --- /dev/null +++ b/better-code/code/05-notification_queue-2.cpp @@ -0,0 +1,29 @@ +class notification_queue { + deque> _q; + bool _done{false}; + mutex _mutex; + condition_variable _ready; + + public: + void done() { + { + lock_t lock{_mutex}; + _done = true; + } + _ready.notify_all(); + } + + bool pop(task& x) { + lock_t lock{_mutex}; + + while (_q.empty() && !_done) _ready.wait(lock); + + if (_q.empty()) return false; + + x = move(_q.front()); + _q.pop_front(); + return true; + } + + ... +}; \ No newline at end of file diff --git a/better-code/code/05-notification_queue-3.cpp b/better-code/code/05-notification_queue-3.cpp new file mode 100644 index 0000000..5f9844c --- /dev/null +++ b/better-code/code/05-notification_queue-3.cpp @@ -0,0 +1,26 @@ +class notification_queue { + deque> _q; + bool _done{false}; + mutex _mutex; + condition_variable _ready; + + public: + bool try_pop(task& x) { + lock_t lock{_mutex, try_to_lock}; + if (!lock || _q.empty()) return false; + x = move(_q.front()); + _q.pop_front(); + return true; + } + + template + bool try_push(F&& f) { + { + lock_t lock{_mutex, try_to_lock}; + if (!lock) return false; + _q.emplace_back(forward(f)); + } + _ready.notify_one(); + return true; + } +}; \ No newline at end of file diff --git a/better-code/code/05-registry-0.cpp b/better-code/code/05-registry-0.cpp new file mode 100644 index 0000000..eeddffa --- /dev/null +++ b/better-code/code/05-registry-0.cpp @@ -0,0 +1,15 @@ +class registry { + mutex _mutex; + unordered_map _map; + + public: + void set(string key, string value) { + unique_lock lock{mutex}; + _map.emplace(move(key), move(value)); + } + + auto get(const string& key) -> string { + unique_lock lock{mutex}; + return _map.at(key); + } +}; \ No newline at end of file diff --git a/better-code/code/05-registry-1.cpp b/better-code/code/05-registry-1.cpp new file mode 100644 index 0000000..9e7ddc9 --- /dev/null +++ b/better-code/code/05-registry-1.cpp @@ -0,0 +1,21 @@ +class registry { + serial_queue _q; + + using map_t = unordered_map; + + shared_ptr _map = make_shared(); + + public: + void set(string key, string value) { + _q.async( + [_map = _map](string key, string value) { + _map->emplace(move(key), move(value)); + }, + move(key), move(value)); + } + + auto get(string key) -> future { + return _q.async([_map = _map](string key) { return _map->at(key); }, + move(key)); + } +}; \ No newline at end of file diff --git a/better-code/code/05-registry-2.cpp b/better-code/code/05-registry-2.cpp new file mode 100644 index 0000000..9137a31 --- /dev/null +++ b/better-code/code/05-registry-2.cpp @@ -0,0 +1,30 @@ +class registry { + serial_queue _q; + + using map_t = unordered_map; + + shared_ptr _map = make_shared(); + + public: + void set(string key, string value) { + _q.async( + [_map = _map](string key, string value) { + _map->emplace(move(key), move(value)); + }, + move(key), move(value)); + } + + void set(vector> sequence) { + _q.async( + [_map = _map](vector> sequence) { + _map->insert(make_move_iterator(begin(sequence)), + make_move_iterator(end(sequence))); + }, + move(sequence)); + } + + auto get(string key) -> future { + return _q.async([_map = _map](string key) { return _map->at(key); }, + move(key)); + } +}; \ No newline at end of file diff --git a/better-code/code/05-task_system-1.cpp b/better-code/code/05-task_system-1.cpp new file mode 100644 index 0000000..db2d223 --- /dev/null +++ b/better-code/code/05-task_system-1.cpp @@ -0,0 +1,29 @@ +class task_system { + const unsigned _count{thread::hardware_concurrency()}; + vector _threads; + notification_queue _q; + + void run(unsigned i) { + while (true) { + task f; + _q.pop(f); + f(); + } + } + + public: + task_system() { + for (unsigned n = 0; n != _count; ++n) { + _threads.emplace_back([&, n] { run(n); }); + } + } + + ~task_system() { + for (auto& e : _threads) e.join(); + } + + template + void async(F&& f) { + _q.push(forward(f)); + } +}; \ No newline at end of file diff --git a/better-code/code/05-task_system-2.cpp b/better-code/code/05-task_system-2.cpp new file mode 100644 index 0000000..2e6a8dd --- /dev/null +++ b/better-code/code/05-task_system-2.cpp @@ -0,0 +1,7 @@ +class task_system { + public: + ~task_system() { + for (auto& e : _q) e.done(); + for (auto& e : _threads) e.join(); + } +}; \ No newline at end of file diff --git a/better-code/code/05-task_system-3.cpp b/better-code/code/05-task_system-3.cpp new file mode 100644 index 0000000..21f722e --- /dev/null +++ b/better-code/code/05-task_system-3.cpp @@ -0,0 +1,26 @@ +class task_system { + const unsigned _count{thread::hardware_concurrency()}; + vector _threads; + vector _q{_count}; + atomic _index{0}; + + void run(unsigned i) { + while (true) { + task f; + if (!_q[i].pop(f)) break; + f(); + } + } + + public: + ~task_system() { + for (auto& e : _q) e.done(); + for (auto& e : _threads) e.join(); + } + + template + void async(F&& f) { + auto i = _index++; + _q[i % _count].push(forward(f)); + } +}; \ No newline at end of file diff --git a/better-code/code/05-task_system-4.cpp b/better-code/code/05-task_system-4.cpp new file mode 100644 index 0000000..bfac938 --- /dev/null +++ b/better-code/code/05-task_system-4.cpp @@ -0,0 +1,34 @@ +class task_system { + const unsigned _count{thread::hardware_concurrency()}; + const unsigned _spin{_count < 64 ? 64 : _count}; + + vector _threads; + vector _q{_count}; + atomic _index{0}; + + void run(unsigned i) { + while (true) { + task f; + + // TODO Take _spin / _count or something different? + for (unsigned n = 0; n != _spin / _count; ++n) { + if (_q[(i + n) % _count].try_pop(f)) break; + } + if (!f && !_q[i].pop(f)) break; + + f(); + } + } + + public: + template + void async(F&& f) { + auto i = _index++; + + for (unsigned n = 0; n != _spin / _count; ++n) { + if (_q[(i + n) % _count].try_push(forward(f))) return; + } + + _q[i % _count].push(forward(f)); + } +}; \ No newline at end of file diff --git a/better-code/code/sort-subrange-0.cpp b/better-code/code/sort-subrange-0.cpp index 49baf8a..60e88e3 100644 --- a/better-code/code/sort-subrange-0.cpp +++ b/better-code/code/sort-subrange-0.cpp @@ -1,4 +1,4 @@ -template // I models RandomAccessIterator +template // I models RandomAccessIterator void sort_subrange_0(I f, I l, I sf, I sl) { - std::sort(f, l); -} + std::sort(f, l); +} \ No newline at end of file diff --git a/better-code/code/sort-subrange-1.cpp b/better-code/code/sort-subrange-1.cpp index 830c1c3..8ce7d94 100644 --- a/better-code/code/sort-subrange-1.cpp +++ b/better-code/code/sort-subrange-1.cpp @@ -1,8 +1,8 @@ -template // I models RandomAccessIterator +template // I models RandomAccessIterator void sort_subrange_1(I f, I l, I sf, I sl) { - std::nth_element(f, sf, l); // partitions [f, l) at sf - if (sf != l) { - ++sf; - std::partial_sort(sf, sl, l); - } -} + std::nth_element(f, sf, l); // partitions [f, l) at sf + if (sf != l) { + ++sf; + std::partial_sort(sf, sl, l); + } +} \ No newline at end of file diff --git a/better-code/figures/05-amdahl_lin.svg b/better-code/figures/05-amdahl_lin.svg new file mode 100644 index 0000000..3f75b9b --- /dev/null +++ b/better-code/figures/05-amdahl_lin.svg @@ -0,0 +1,169 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +1 +2 +3 +4 +5 +6 +7 +8 +9 +10 +11 +12 +13 +14 +15 +16 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +1 +2 +3 +4 +5 +6 +7 +8 +9 +10 +11 +12 +13 +14 +15 +16 +Number of Processors +Speedup + +Percentage of +synchronization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +0% +10% +20% +30% +40% + diff --git a/better-code/figures/05-amdahl_log.svg b/better-code/figures/05-amdahl_log.svg new file mode 100644 index 0000000..13e4ccb --- /dev/null +++ b/better-code/figures/05-amdahl_log.svg @@ -0,0 +1,142 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +2 +4 +6 +8 +10 +12 +14 +16 +18 +20 + + + + + + + + + + + + + + + + + + + + + + + + + + + + +1 +2 +4 +8 +16 +32 +64 +128 +256 +512 +1024 +2048 +4096 +8192 +16384 +32768 +65536 +Number of Processors +Speedup + +Percentage of +synchronization + + + + + + + + + + + + + + + + + + + + +5% +10% +25% +50% + diff --git a/better-code/figures/05-background_thread.svg b/better-code/figures/05-background_thread.svg new file mode 100644 index 0000000..5a64cab --- /dev/null +++ b/better-code/figures/05-background_thread.svg @@ -0,0 +1,337 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + Task + + + + . + . + . + + + + Task + + + MainThread + + + Task + + + + Task + + BackgroundThread + + diff --git a/better-code/figures/05-dead_lock.svg b/better-code/figures/05-dead_lock.svg new file mode 100644 index 0000000..c35a4ac --- /dev/null +++ b/better-code/figures/05-dead_lock.svg @@ -0,0 +1,448 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + Task + + + + Task + + + + Taska + + Taskb + + + + + Task + + + + Task + + + + Task + + + + Taskb + + + + + + State 1 + State 2 + + diff --git a/better-code/figures/05-sequential_operations.svg b/better-code/figures/05-sequential_operations.svg new file mode 100644 index 0000000..fc14b56 --- /dev/null +++ b/better-code/figures/05-sequential_operations.svg @@ -0,0 +1,599 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + Opm(x) + + rm + + + + + + Op3(x) + + r3 + + + + + + Op2(x) + + r2 + + + + + Op1(x) + + r1 + + + . + . + . + + + + diff --git a/better-code/figures/05-simple_tasking_system.svg b/better-code/figures/05-simple_tasking_system.svg new file mode 100644 index 0000000..9410583 --- /dev/null +++ b/better-code/figures/05-simple_tasking_system.svg @@ -0,0 +1,448 @@ + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + Thread + + + + Core + + + + + + Thread + + + + Core + + + + + + Thread + + + + Core + + + ... + + + + + + Task + + + + . + . + . + + + Task + + + + Task + + + + Task + + + + diff --git a/better-code/figures/05-task_system_multiple_queues.svg b/better-code/figures/05-task_system_multiple_queues.svg new file mode 100644 index 0000000..4204172 --- /dev/null +++ b/better-code/figures/05-task_system_multiple_queues.svg @@ -0,0 +1,769 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + Scheduler + + + + + + Thread + + + + Core + + + + + + Thread + + + + Core + + + + + + Thread + + + + Core + + + ... + + + Task + + + + + + + + + + . + . + . + + + Task + + + + Task + + + + Task + + + + + . + . + . + + + Task + + + + Task + + + + Task + + + + + . + . + . + + + Task + + + + Task + + + + Task + + + + diff --git a/better-code/figures/05-task_system_task_stealing.svg b/better-code/figures/05-task_system_task_stealing.svg new file mode 100644 index 0000000..2c4fb82 --- /dev/null +++ b/better-code/figures/05-task_system_task_stealing.svg @@ -0,0 +1,836 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + Scheduler + + + + + + Thread + + + + Core + + + + + + Thread + + + + Core + + + + + + Thread + + + + Core + + + ... + + + Task + + + + . + . + . + + + Task + + + + Task + + + + Task + + + + + + + + + + + Task Stealing + + + + + + + . + . + . + + + Task + + + + Task + + + + Task + + + + + . + . + . + + + Task + + + + Task + + + + Task + + + + diff --git a/better-code/figures/05-traditional_locking-1.svg b/better-code/figures/05-traditional_locking-1.svg new file mode 100644 index 0000000..7b95a8d --- /dev/null +++ b/better-code/figures/05-traditional_locking-1.svg @@ -0,0 +1,279 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + Object + + + + Thread + + + + + Thread + + + + Thread + + + + + + Thread + + + + diff --git a/better-code/figures/05-traditional_locking-2.svg b/better-code/figures/05-traditional_locking-2.svg new file mode 100644 index 0000000..1e961b0 --- /dev/null +++ b/better-code/figures/05-traditional_locking-2.svg @@ -0,0 +1,372 @@ + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + Object + + + + Thread + + + + + Thread + + + + Thread + + + + + + Thread + + + + + STOP + + + + STOP + + + + STOP + + + + GO + + + diff --git a/better-code/figures/05-traditional_locking-3.svg b/better-code/figures/05-traditional_locking-3.svg new file mode 100644 index 0000000..176778e --- /dev/null +++ b/better-code/figures/05-traditional_locking-3.svg @@ -0,0 +1,378 @@ + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + Object + + + + Thread + + + + + Thread + + + + Thread + + + + + + Thread + + + + + STOP + + + + STOP + + + + STOP + + + + GO + + + diff --git a/better-code/figures/R/amdahl_lin.R b/better-code/figures/R/amdahl_lin.R new file mode 100644 index 0000000..d488bca --- /dev/null +++ b/better-code/figures/R/amdahl_lin.R @@ -0,0 +1,44 @@ +library(ggplot2) +library(scales) + +amdahl <- function(x, p) {1.0 / ((1.0 - p) + p / x)} + +amdahlFun10 <- function(x) amdahl(x, 0.90) +amdahlFun20 <- function(x) amdahl(x, 0.80) +amdahlFun30 <- function(x) amdahl(x, 0.70) +amdahlFun40 <- function(x) amdahl(x, 0.60) +optimalFun <- function(x) x + +LegendTitle = "Percentage of\nsynchronization" + +p1 <- ggplot(data.frame(x = c(1, 16)), aes(x=x)) + + stat_function(fun = optimalFun, aes(colour = "amdahlFun00", linetype="amdahlFun00")) + + stat_function(fun = amdahlFun10, aes(colour = "amdahlFun10", linetype="amdahlFun10")) + + stat_function(fun = amdahlFun20, aes(colour = "amdahlFun20", linetype="amdahlFun20")) + + stat_function(fun = amdahlFun30, aes(colour = "amdahlFun30", linetype="amdahlFun30")) + + stat_function(fun = amdahlFun40, aes(colour = "amdahlFun40", linetype="amdahlFun40")) + + scale_x_continuous(name = "Number of Processors", breaks = seq(1, 16, 1), limits=c(1, 16)) + + scale_y_continuous(name = "Speedup", breaks = seq(1, 16, 1)) + + scale_colour_manual(LegendTitle, + labels = c("0%", "10%", "20%", "30%", "40%"), + values = c("amdahlFun00" = "green", "amdahlFun10" = "red", "amdahlFun20" = "orange", "amdahlFun30" = "deeppink", "amdahlFun40" = "blue")) + + scale_linetype_manual(LegendTitle, + values = c("amdahlFun00" = "solid", "amdahlFun10" = "dashed", "amdahlFun20" = "longdash", "amdahlFun30" = "dotted", "amdahlFun40" = "dotdash"), + labels = c("0%", "10%", "20%", "30%", "40%")) + + theme_bw() + + theme(axis.line = element_line(size=1, colour = "black"), + panel.grid.major = element_line(colour = "#d3d3d3"), + panel.grid.minor = element_blank(), + panel.border = element_blank(), panel.background = element_blank(), + text=element_text(family="Arial"), + legend.title=element_text(size = 10, family = "Arial"), + legend.text = element_text(size = 10), + legend.position = c(0.25, 0.8), + axis.text.x=element_text(colour="black", size = 8), + axis.text.y=element_text(colour="black", size = 8), + axis.title.x=element_text(colour="black", size = 12), + axis.title.y=element_text(colour="black", size = 12)) + +ggsave(file="../05-amdahl_lin.svg", plot=p1, width=8, height=8) + + diff --git a/better-code/figures/R/amdahl_log.R b/better-code/figures/R/amdahl_log.R new file mode 100644 index 0000000..0ad6ab9 --- /dev/null +++ b/better-code/figures/R/amdahl_log.R @@ -0,0 +1,43 @@ +library(ggplot2) +library(scales) + +amdahl <- function(x, p) {1.0 / ((1.0 - p) + p / x)} + +amdahlFun05 <- function(x) amdahl(x, 0.95) +amdahlFun10 <- function(x) amdahl(x, 0.90) +amdahlFun25 <- function(x) amdahl(x, 0.75) +amdahlFun50 <- function(x) amdahl(x, 0.50) + +LegendTitle = "Percentage of\nsynchronization" + +p1 <- ggplot(data = data.frame(x = c(1, 65536)), mapping = aes(x=x)) + + stat_function(fun = amdahlFun05, mapping = aes(colour = "amdahlFun05", linetype = "amdahlFun05")) + + stat_function(fun = amdahlFun10, mapping = aes(colour = "amdahlFun10", linetype = "amdahlFun10")) + + stat_function(fun = amdahlFun25, mapping = aes(colour = "amdahlFun25", linetype = "amdahlFun25")) + + stat_function(fun = amdahlFun50, mapping = aes(colour = "amdahlFun50", linetype = "amdahlFun50")) + + scale_x_continuous(trans = 'log2', name = "Number of Processors", limits=c(1, 65536), + breaks = trans_breaks("log2", n = 16, function(x) 2^x), labels = trans_format("log2", function(x) 2^x)) + + scale_y_continuous(name = "Speedup", breaks = seq(0, 21, 2), limits=c(1, 20)) + + scale_colour_manual(LegendTitle, + values = c("amdahlFun05" = "red", "amdahlFun10" = "orange", "amdahlFun25" = "deeppink", "amdahlFun50" = "blue"), + labels = c("5%", "10%", "25%", "50%")) + + scale_linetype_manual(LegendTitle, + values = c("amdahlFun05" = "solid", "amdahlFun10" = "dashed", "amdahlFun25" = "longdash", "amdahlFun50" = "dotted"), + labels = c("5%", "10%", "25%", "50%")) + + theme_bw() + + theme(axis.line = element_line(size=1, colour = "black"), + panel.grid.major = element_line(colour = "#d3d3d3"), + panel.grid.minor = element_blank(), + panel.border = element_blank(), panel.background = element_blank(), + text=element_text(family="Arial"), + legend.title=element_text(size = 10, family = "Arial"), + legend.text = element_text(size = 10), + legend.position = c(0.25, 0.8), + axis.text.x=element_text(colour="black", size = 8), + axis.text.y=element_text(colour="black", size = 8), + axis.title.x=element_text(colour="black", size = 12), + axis.title.y=element_text(colour="black", size = 12)) + + ggsave(file="../05-amdahl_log.svg", plot=p1, width=8, height=6) + + diff --git a/startDocker.bat b/startDocker.bat new file mode 100644 index 0000000..03808a2 --- /dev/null +++ b/startDocker.bat @@ -0,0 +1,2 @@ +set VOLUME=sean-parent.github.io +docker run --mount type=bind,source="%CD%",target=/mnt/host --tty --interactive --publish 3000-3001:3000-3001 %VOLUME% bash \ No newline at end of file