@@ -52,7 +52,7 @@ CONTENTS
5252
5353 - Varieties of memory barrier.
5454 - What may not be assumed about memory barriers?
55- - Data dependency barriers (historical).
55+ - Address- dependency barriers (historical).
5656 - Control dependencies.
5757 - SMP barrier pairing.
5858 - Examples of memory barrier sequences.
@@ -187,9 +187,9 @@ As a further example, consider this sequence of events:
187187 B = 4; Q = P;
188188 P = &B; D = *Q;
189189
190- There is an obvious data dependency here, as the value loaded into D depends on
191- the address retrieved from P by CPU 2. At the end of the sequence, any of the
192- following results are possible:
190+ There is an obvious address dependency here, as the value loaded into D depends
191+ on the address retrieved from P by CPU 2. At the end of the sequence, any of
192+ the following results are possible:
193193
194194 (Q == &A) and (D == 1)
195195 (Q == &B) and (D == 2)
@@ -391,58 +391,62 @@ Memory barriers come in four basic varieties:
391391 memory system as time progresses. All stores _before_ a write barrier
392392 will occur _before_ all the stores after the write barrier.
393393
394- [!] Note that write barriers should normally be paired with read or data
395- dependency barriers; see the "SMP barrier pairing" subsection.
394+ [!] Note that write barriers should normally be paired with read or
395+ address- dependency barriers; see the "SMP barrier pairing" subsection.
396396
397397
398- (2) Data dependency barriers.
398+ (2) Address- dependency barriers (historical) .
399399
400- A data dependency barrier is a weaker form of read barrier. In the case
401- where two loads are performed such that the second depends on the result
402- of the first (eg: the first load retrieves the address to which the second
403- load will be directed), a data dependency barrier would be required to
404- make sure that the target of the second load is updated after the address
405- obtained by the first load is accessed.
400+ An address- dependency barrier is a weaker form of read barrier. In the
401+ case where two loads are performed such that the second depends on the
402+ result of the first (eg: the first load retrieves the address to which
403+ the second load will be directed), an address- dependency barrier would
404+ be required to make sure that the target of the second load is updated
405+ after the address obtained by the first load is accessed.
406406
407- A data dependency barrier is a partial ordering on interdependent loads
408- only; it is not required to have any effect on stores, independent loads
409- or overlapping loads.
407+ An address- dependency barrier is a partial ordering on interdependent
408+ loads only; it is not required to have any effect on stores, independent
409+ loads or overlapping loads.
410410
411411 As mentioned in (1), the other CPUs in the system can be viewed as
412412 committing sequences of stores to the memory system that the CPU being
413- considered can then perceive. A data dependency barrier issued by the CPU
414- under consideration guarantees that for any load preceding it, if that
415- load touches one of a sequence of stores from another CPU, then by the
416- time the barrier completes, the effects of all the stores prior to that
417- touched by the load will be perceptible to any loads issued after the data
418- dependency barrier.
413+ considered can then perceive. An address- dependency barrier issued by
414+ the CPU under consideration guarantees that for any load preceding it,
415+ if that load touches one of a sequence of stores from another CPU, then
416+ by the time the barrier completes, the effects of all the stores prior to
417+ that touched by the load will be perceptible to any loads issued after
418+ the address- dependency barrier.
419419
420420 See the "Examples of memory barrier sequences" subsection for diagrams
421421 showing the ordering constraints.
422422
423- [!] Note that the first load really has to have a _data_ dependency and
423+ [!] Note that the first load really has to have an _address_ dependency and
424424 not a control dependency. If the address for the second load is dependent
425425 on the first load, but the dependency is through a conditional rather than
426426 actually loading the address itself, then it's a _control_ dependency and
427427 a full read barrier or better is required. See the "Control dependencies"
428428 subsection for more information.
429429
430- [!] Note that data dependency barriers should normally be paired with
430+ [!] Note that address- dependency barriers should normally be paired with
431431 write barriers; see the "SMP barrier pairing" subsection.
432432
433+ [!] Kernel release v5.9 removed kernel APIs for explicit address-
434+ dependency barriers. Nowadays, APIs for marking loads from shared
435+ variables such as READ_ONCE() and rcu_dereference() provide implicit
436+ address-dependency barriers.
433437
434438 (3) Read (or load) memory barriers.
435439
436- A read barrier is a data dependency barrier plus a guarantee that all the
437- LOAD operations specified before the barrier will appear to happen before
438- all the LOAD operations specified after the barrier with respect to the
439- other components of the system.
440+ A read barrier is an address- dependency barrier plus a guarantee that all
441+ the LOAD operations specified before the barrier will appear to happen
442+ before all the LOAD operations specified after the barrier with respect to
443+ the other components of the system.
440444
441445 A read barrier is a partial ordering on loads only; it is not required to
442446 have any effect on stores.
443447
444- Read memory barriers imply data dependency barriers, and so can substitute
445- for them.
448+ Read memory barriers imply address- dependency barriers, and so can
449+ substitute for them.
446450
447451 [!] Note that read barriers should normally be paired with write barriers;
448452 see the "SMP barrier pairing" subsection.
@@ -550,17 +554,21 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
550554 Documentation/core-api/dma-api.rst
551555
552556
553- DATA DEPENDENCY BARRIERS (HISTORICAL)
554- -------------------------------------
557+ ADDRESS- DEPENDENCY BARRIERS (HISTORICAL)
558+ ----------------------------------------
555559
556560As of v4.15 of the Linux kernel, an smp_mb() was added to READ_ONCE() for
557561DEC Alpha, which means that about the only people who need to pay attention
558562to this section are those working on DEC Alpha architecture-specific code
559563and those working on READ_ONCE() itself. For those who need it, and for
560564those who are interested in the history, here is the story of
561- data-dependency barriers.
565+ address-dependency barriers.
566+
567+ [!] While address dependencies are observed in both load-to-load and
568+ load-to-store relations, address-dependency barriers are not necessary
569+ for load-to-store situations.
562570
563- The usage requirements of data dependency barriers are a little subtle, and
571+ The requirement of address- dependency barriers is a little subtle, and
564572it's not always obvious that they're needed. To illustrate, consider the
565573following sequence of events:
566574
@@ -570,11 +578,14 @@ following sequence of events:
570578 B = 4;
571579 <write barrier>
572580 WRITE_ONCE(P, &B);
573- Q = READ_ONCE (P);
581+ Q = READ_ONCE_OLD (P);
574582 D = *Q;
575583
576- There's a clear data dependency here, and it would seem that by the end of the
577- sequence, Q must be either &A or &B, and that:
584+ [!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
585+ doesn't imply an address-dependency barrier.
586+
587+ There's a clear address dependency here, and it would seem that by the end of
588+ the sequence, Q must be either &A or &B, and that:
578589
579590 (Q == &A) implies (D == 1)
580591 (Q == &B) implies (D == 4)
@@ -588,8 +599,8 @@ While this may seem like a failure of coherency or causality maintenance, it
588599isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
589600Alpha).
590601
591- To deal with this, a data dependency barrier or better must be inserted
592- between the address load and the data load :
602+ To deal with this, READ_ONCE() provides an implicit address-dependency barrier
603+ since kernel release v4.15 :
593604
594605 CPU 1 CPU 2
595606 =============== ===============
@@ -598,7 +609,7 @@ between the address load and the data load:
598609 <write barrier>
599610 WRITE_ONCE(P, &B);
600611 Q = READ_ONCE(P);
601- <data dependency barrier>
612+ <implicit address- dependency barrier>
602613 D = *Q;
603614
604615This enforces the occurrence of one of the two implications, and prevents the
@@ -615,26 +626,26 @@ odd-numbered bank is idle, one can see the new value of the pointer P (&B),
615626but the old value of the variable B (2).
616627
617628
618- A data -dependency barrier is not required to order dependent writes
619- because the CPUs that the Linux kernel supports don't do writes
620- until they are certain (1) that the write will actually happen, (2)
621- of the location of the write, and (3) of the value to be written.
629+ An address -dependency barrier is not required to order dependent writes
630+ because the CPUs that the Linux kernel supports don't do writes until they
631+ are certain (1) that the write will actually happen, (2) of the location of
632+ the write, and (3) of the value to be written.
622633But please carefully read the "CONTROL DEPENDENCIES" section and the
623- Documentation/RCU/rcu_dereference.rst file: The compiler can and does
624- break dependencies in a great many highly creative ways.
634+ Documentation/RCU/rcu_dereference.rst file: The compiler can and does break
635+ dependencies in a great many highly creative ways.
625636
626637 CPU 1 CPU 2
627638 =============== ===============
628639 { A == 1, B == 2, C = 3, P == &A, Q == &C }
629640 B = 4;
630641 <write barrier>
631642 WRITE_ONCE(P, &B);
632- Q = READ_ONCE (P);
643+ Q = READ_ONCE_OLD (P);
633644 WRITE_ONCE(*Q, 5);
634645
635- Therefore, no data -dependency barrier is required to order the read into
646+ Therefore, no address -dependency barrier is required to order the read into
636647Q with the store into *Q. In other words, this outcome is prohibited,
637- even without a data -dependency barrier:
648+ even without an implicit address -dependency barrier of modern READ_ONCE() :
638649
639650 (Q == &B) && (B == 4)
640651
@@ -645,12 +656,12 @@ can be used to record rare error conditions and the like, and the CPUs'
645656naturally occurring ordering prevents such records from being lost.
646657
647658
648- Note well that the ordering provided by a data dependency is local to
659+ Note well that the ordering provided by an address dependency is local to
649660the CPU containing it. See the section on "Multicopy atomicity" for
650661more information.
651662
652663
653- The data dependency barrier is very important to the RCU system,
664+ The address- dependency barrier is very important to the RCU system,
654665for example. See rcu_assign_pointer() and rcu_dereference() in
655666include/linux/rcupdate.h. This permits the current target of an RCU'd
656667pointer to be replaced with a new modified target, without the replacement
@@ -667,20 +678,21 @@ not understand them. The purpose of this section is to help you prevent
667678the compiler's ignorance from breaking your code.
668679
669680A load-load control dependency requires a full read memory barrier, not
670- simply a data dependency barrier to make it work correctly. Consider the
671- following bit of code:
681+ simply an (implicit) address- dependency barrier to make it work correctly.
682+ Consider the following bit of code:
672683
673684 q = READ_ONCE(a);
685+ <implicit address-dependency barrier>
674686 if (q) {
675- <data dependency barrier> /* BUG: No data dependency!!! */
687+ /* BUG: No address dependency!!! */
676688 p = READ_ONCE(b);
677689 }
678690
679- This will not have the desired effect because there is no actual data
691+ This will not have the desired effect because there is no actual address
680692dependency, but rather a control dependency that the CPU may short-circuit
681693by attempting to predict the outcome in advance, so that other CPUs see
682- the load from b as having happened before the load from a. In such a
683- case what's actually required is:
694+ the load from b as having happened before the load from a. In such a case
695+ what's actually required is:
684696
685697 q = READ_ONCE(a);
686698 if (q) {
@@ -927,9 +939,9 @@ General barriers pair with each other, though they also pair with most
927939other types of barriers, albeit without multicopy atomicity. An acquire
928940barrier pairs with a release barrier, but both may also pair with other
929941barriers, including of course general barriers. A write barrier pairs
930- with a data dependency barrier, a control dependency, an acquire barrier,
942+ with an address- dependency barrier, a control dependency, an acquire barrier,
931943a release barrier, a read barrier, or a general barrier. Similarly a
932- read barrier, control dependency, or a data dependency barrier pairs
944+ read barrier, control dependency, or an address- dependency barrier pairs
933945with a write barrier, an acquire barrier, a release barrier, or a
934946general barrier:
935947
948960 a = 1;
949961 <write barrier>
950962 WRITE_ONCE(b, &a); x = READ_ONCE(b);
951- <data dependency barrier>
963+ <implicit address- dependency barrier>
952964 y = *x;
953965
954966Or even:
@@ -968,8 +980,8 @@ Basically, the read barrier always has to be there, even though it can be of
968980the "weaker" type.
969981
970982[!] Note that the stores before the write barrier would normally be expected to
971- match the loads after the read barrier or the data dependency barrier, and vice
972- versa:
983+ match the loads after the read barrier or the address- dependency barrier, and
984+ vice versa:
973985
974986 CPU 1 CPU 2
975987 =================== ===================
@@ -1021,8 +1033,8 @@ STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
10211033 V
10221034
10231035
1024- Secondly, data dependency barriers act as partial orderings on data-dependent
1025- loads. Consider the following sequence of events:
1036+ Secondly, address- dependency barriers act as partial orderings on address-
1037+ dependent loads. Consider the following sequence of events:
10261038
10271039 CPU 1 CPU 2
10281040 ======================= =======================
@@ -1067,8 +1079,8 @@ effectively random order, despite the write barrier issued by CPU 1:
10671079In the above example, CPU 2 perceives that B is 7, despite the load of *C
10681080(which would be B) coming after the LOAD of C.
10691081
1070- If, however, a data dependency barrier were to be placed between the load of C
1071- and the load of *C (ie: B) on CPU 2:
1082+ If, however, an address- dependency barrier were to be placed between the load
1083+ of C and the load of *C (ie: B) on CPU 2:
10721084
10731085 CPU 1 CPU 2
10741086 ======================= =======================
@@ -1078,7 +1090,7 @@ and the load of *C (ie: B) on CPU 2:
10781090 <write barrier>
10791091 STORE C = &B LOAD X
10801092 STORE D = 4 LOAD C (gets &B)
1081- <data dependency barrier>
1093+ <address- dependency barrier>
10821094 LOAD *C (reads B)
10831095
10841096then the following will occur:
@@ -1101,7 +1113,7 @@ then the following will occur:
11011113 | +-------+ | |
11021114 | | X->9 |------>| |
11031115 | +-------+ | |
1104- Makes sure all effects ---> \ ddddddddddddddddd | |
1116+ Makes sure all effects ---> \ aaaaaaaaaaaaaaaaa | |
11051117 prior to the store of C \ +-------+ | |
11061118 are perceptible to ----->| B->2 |------>| |
11071119 subsequent loads +-------+ | |
@@ -1292,7 +1304,7 @@ Which might appear as this:
12921304 LOAD with immediate effect : : +-------+
12931305
12941306
1295- Placing a read barrier or a data dependency barrier just before the second
1307+ Placing a read barrier or an address- dependency barrier just before the second
12961308load:
12971309
12981310 CPU 1 CPU 2
@@ -1816,20 +1828,20 @@ which may then reorder things however it wishes.
18161828CPU MEMORY BARRIERS
18171829-------------------
18181830
1819- The Linux kernel has eight basic CPU memory barriers:
1831+ The Linux kernel has seven basic CPU memory barriers:
18201832
1821- TYPE MANDATORY SMP CONDITIONAL
1822- =============== ======================= ============ ===============
1823- GENERAL mb() smp_mb()
1824- WRITE wmb() smp_wmb()
1825- READ rmb() smp_rmb()
1826- DATA DEPENDENCY READ_ONCE()
1833+ TYPE MANDATORY SMP CONDITIONAL
1834+ ======================= =============== ===============
1835+ GENERAL mb() smp_mb()
1836+ WRITE wmb() smp_wmb()
1837+ READ rmb() smp_rmb()
1838+ ADDRESS DEPENDENCY READ_ONCE()
18271839
18281840
1829- All memory barriers except the data dependency barriers imply a compiler
1830- barrier. Data dependencies do not impose any additional compiler ordering.
1841+ All memory barriers except the address- dependency barriers imply a compiler
1842+ barrier. Address dependencies do not impose any additional compiler ordering.
18311843
1832- Aside: In the case of data dependencies, the compiler would be expected
1844+ Aside: In the case of address dependencies, the compiler would be expected
18331845to issue the loads in the correct order (eg. `a[b]` would have to load
18341846the value of b before loading a[b]), however there is no guarantee in
18351847the C specification that the compiler may not speculate the value of b
@@ -2749,7 +2761,8 @@ is discarded from the CPU's cache and reloaded. To deal with this, the
27492761appropriate part of the kernel must invalidate the overlapping bits of the
27502762cache on each CPU.
27512763
2752- See Documentation/core-api/cachetlb.rst for more information on cache management.
2764+ See Documentation/core-api/cachetlb.rst for more information on cache
2765+ management.
27532766
27542767
27552768CACHE COHERENCY VS MMIO
@@ -2889,8 +2902,8 @@ AND THEN THERE'S THE ALPHA
28892902The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that,
28902903some versions of the Alpha CPU have a split data cache, permitting them to have
28912904two semantically-related cache lines updated at separate times. This is where
2892- the data dependency barrier really becomes necessary as this synchronises both
2893- caches with the memory coherence system, thus making it seem like pointer
2905+ the address- dependency barrier really becomes necessary as this synchronises
2906+ both caches with the memory coherence system, thus making it seem like pointer
28942907changes vs new data occur in the right order.
28952908
28962909The Alpha defines the Linux kernel's memory model, although as of v4.15
0 commit comments