Skip to content

Commit 60a427d

Browse files
jpemartinsakpm00
authored andcommitted
mm/hugetlb_vmemmap: move comment block to Documentation/vm
In preparation for device-dax for using hugetlbfs compound page tail deduplication technique, move the comment block explanation into a common place in Documentation/vm. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Reviewed-by: Muchun Song <[email protected]> Reviewed-by: Dan Williams <[email protected]> Suggested-by: Dan Williams <[email protected]> Cc: Muchun Song <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Jane Chu <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Vishal Verma <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 2beea70 commit 60a427d

File tree

3 files changed

+175
-167
lines changed

3 files changed

+175
-167
lines changed

Documentation/vm/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,5 +37,6 @@ algorithms. If you are looking for advice on simply allocating memory, see the
3737
transhuge
3838
unevictable-lru
3939
vmalloced-kernel-stacks
40+
vmemmap_dedup
4041
z3fold
4142
zsmalloc

Documentation/vm/vmemmap_dedup.rst

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
==================================
4+
Free some vmemmap pages of HugeTLB
5+
==================================
6+
7+
The struct page structures (page structs) are used to describe a physical
8+
page frame. By default, there is a one-to-one mapping from a page frame to
9+
it's corresponding page struct.
10+
11+
HugeTLB pages consist of multiple base page size pages and is supported by many
12+
architectures. See Documentation/admin-guide/mm/hugetlbpage.rst for more
13+
details. On the x86-64 architecture, HugeTLB pages of size 2MB and 1GB are
14+
currently supported. Since the base page size on x86 is 4KB, a 2MB HugeTLB page
15+
consists of 512 base pages and a 1GB HugeTLB page consists of 4096 base pages.
16+
For each base page, there is a corresponding page struct.
17+
18+
Within the HugeTLB subsystem, only the first 4 page structs are used to
19+
contain unique information about a HugeTLB page. __NR_USED_SUBPAGE provides
20+
this upper limit. The only 'useful' information in the remaining page structs
21+
is the compound_head field, and this field is the same for all tail pages.
22+
23+
By removing redundant page structs for HugeTLB pages, memory can be returned
24+
to the buddy allocator for other uses.
25+
26+
Different architectures support different HugeTLB pages. For example, the
27+
following table is the HugeTLB page size supported by x86 and arm64
28+
architectures. Because arm64 supports 4k, 16k, and 64k base pages and
29+
supports contiguous entries, so it supports many kinds of sizes of HugeTLB
30+
page.
31+
32+
+--------------+-----------+-----------------------------------------------+
33+
| Architecture | Page Size | HugeTLB Page Size |
34+
+--------------+-----------+-----------+-----------+-----------+-----------+
35+
| x86-64 | 4KB | 2MB | 1GB | | |
36+
+--------------+-----------+-----------+-----------+-----------+-----------+
37+
| | 4KB | 64KB | 2MB | 32MB | 1GB |
38+
| +-----------+-----------+-----------+-----------+-----------+
39+
| arm64 | 16KB | 2MB | 32MB | 1GB | |
40+
| +-----------+-----------+-----------+-----------+-----------+
41+
| | 64KB | 2MB | 512MB | 16GB | |
42+
+--------------+-----------+-----------+-----------+-----------+-----------+
43+
44+
When the system boot up, every HugeTLB page has more than one struct page
45+
structs which size is (unit: pages)::
46+
47+
struct_size = HugeTLB_Size / PAGE_SIZE * sizeof(struct page) / PAGE_SIZE
48+
49+
Where HugeTLB_Size is the size of the HugeTLB page. We know that the size
50+
of the HugeTLB page is always n times PAGE_SIZE. So we can get the following
51+
relationship::
52+
53+
HugeTLB_Size = n * PAGE_SIZE
54+
55+
Then::
56+
57+
struct_size = n * PAGE_SIZE / PAGE_SIZE * sizeof(struct page) / PAGE_SIZE
58+
= n * sizeof(struct page) / PAGE_SIZE
59+
60+
We can use huge mapping at the pud/pmd level for the HugeTLB page.
61+
62+
For the HugeTLB page of the pmd level mapping, then::
63+
64+
struct_size = n * sizeof(struct page) / PAGE_SIZE
65+
= PAGE_SIZE / sizeof(pte_t) * sizeof(struct page) / PAGE_SIZE
66+
= sizeof(struct page) / sizeof(pte_t)
67+
= 64 / 8
68+
= 8 (pages)
69+
70+
Where n is how many pte entries which one page can contains. So the value of
71+
n is (PAGE_SIZE / sizeof(pte_t)).
72+
73+
This optimization only supports 64-bit system, so the value of sizeof(pte_t)
74+
is 8. And this optimization also applicable only when the size of struct page
75+
is a power of two. In most cases, the size of struct page is 64 bytes (e.g.
76+
x86-64 and arm64). So if we use pmd level mapping for a HugeTLB page, the
77+
size of struct page structs of it is 8 page frames which size depends on the
78+
size of the base page.
79+
80+
For the HugeTLB page of the pud level mapping, then::
81+
82+
struct_size = PAGE_SIZE / sizeof(pmd_t) * struct_size(pmd)
83+
= PAGE_SIZE / 8 * 8 (pages)
84+
= PAGE_SIZE (pages)
85+
86+
Where the struct_size(pmd) is the size of the struct page structs of a
87+
HugeTLB page of the pmd level mapping.
88+
89+
E.g.: A 2MB HugeTLB page on x86_64 consists in 8 page frames while 1GB
90+
HugeTLB page consists in 4096.
91+
92+
Next, we take the pmd level mapping of the HugeTLB page as an example to
93+
show the internal implementation of this optimization. There are 8 pages
94+
struct page structs associated with a HugeTLB page which is pmd mapped.
95+
96+
Here is how things look before optimization::
97+
98+
HugeTLB struct pages(8 pages) page frame(8 pages)
99+
+-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+
100+
| | | 0 | -------------> | 0 |
101+
| | +-----------+ +-----------+
102+
| | | 1 | -------------> | 1 |
103+
| | +-----------+ +-----------+
104+
| | | 2 | -------------> | 2 |
105+
| | +-----------+ +-----------+
106+
| | | 3 | -------------> | 3 |
107+
| | +-----------+ +-----------+
108+
| | | 4 | -------------> | 4 |
109+
| PMD | +-----------+ +-----------+
110+
| level | | 5 | -------------> | 5 |
111+
| mapping | +-----------+ +-----------+
112+
| | | 6 | -------------> | 6 |
113+
| | +-----------+ +-----------+
114+
| | | 7 | -------------> | 7 |
115+
| | +-----------+ +-----------+
116+
| |
117+
| |
118+
| |
119+
+-----------+
120+
121+
The value of page->compound_head is the same for all tail pages. The first
122+
page of page structs (page 0) associated with the HugeTLB page contains the 4
123+
page structs necessary to describe the HugeTLB. The only use of the remaining
124+
pages of page structs (page 1 to page 7) is to point to page->compound_head.
125+
Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of page structs
126+
will be used for each HugeTLB page. This will allow us to free the remaining
127+
7 pages to the buddy allocator.
128+
129+
Here is how things look after remapping::
130+
131+
HugeTLB struct pages(8 pages) page frame(8 pages)
132+
+-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+
133+
| | | 0 | -------------> | 0 |
134+
| | +-----------+ +-----------+
135+
| | | 1 | ---------------^ ^ ^ ^ ^ ^ ^
136+
| | +-----------+ | | | | | |
137+
| | | 2 | -----------------+ | | | | |
138+
| | +-----------+ | | | | |
139+
| | | 3 | -------------------+ | | | |
140+
| | +-----------+ | | | |
141+
| | | 4 | ---------------------+ | | |
142+
| PMD | +-----------+ | | |
143+
| level | | 5 | -----------------------+ | |
144+
| mapping | +-----------+ | |
145+
| | | 6 | -------------------------+ |
146+
| | +-----------+ |
147+
| | | 7 | ---------------------------+
148+
| | +-----------+
149+
| |
150+
| |
151+
| |
152+
+-----------+
153+
154+
When a HugeTLB is freed to the buddy system, we should allocate 7 pages for
155+
vmemmap pages and restore the previous mapping relationship.
156+
157+
For the HugeTLB page of the pud level mapping. It is similar to the former.
158+
We also can use this approach to free (PAGE_SIZE - 1) vmemmap pages.
159+
160+
Apart from the HugeTLB page of the pmd/pud level mapping, some architectures
161+
(e.g. aarch64) provides a contiguous bit in the translation table entries
162+
that hints to the MMU to indicate that it is one of a contiguous set of
163+
entries that can be cached in a single TLB entry.
164+
165+
The contiguous bit is used to increase the mapping size at the pmd and pte
166+
(last) level. So this type of HugeTLB page can be optimized only when its
167+
size of the struct page structs is greater than 1 page.
168+
169+
Notice: The head vmemmap page is not freed to the buddy allocator and all
170+
tail vmemmap pages are mapped to the head vmemmap page frame. So we can see
171+
more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page)
172+
associated with each HugeTLB page. The compound_head() can handle this
173+
correctly (more details refer to the comment above compound_head()).

mm/hugetlb_vmemmap.c

Lines changed: 1 addition & 167 deletions
Original file line numberDiff line numberDiff line change
@@ -6,173 +6,7 @@
66
*
77
* Author: Muchun Song <[email protected]>
88
*
9-
* The struct page structures (page structs) are used to describe a physical
10-
* page frame. By default, there is a one-to-one mapping from a page frame to
11-
* it's corresponding page struct.
12-
*
13-
* HugeTLB pages consist of multiple base page size pages and is supported by
14-
* many architectures. See hugetlbpage.rst in the Documentation directory for
15-
* more details. On the x86-64 architecture, HugeTLB pages of size 2MB and 1GB
16-
* are currently supported. Since the base page size on x86 is 4KB, a 2MB
17-
* HugeTLB page consists of 512 base pages and a 1GB HugeTLB page consists of
18-
* 4096 base pages. For each base page, there is a corresponding page struct.
19-
*
20-
* Within the HugeTLB subsystem, only the first 4 page structs are used to
21-
* contain unique information about a HugeTLB page. __NR_USED_SUBPAGE provides
22-
* this upper limit. The only 'useful' information in the remaining page structs
23-
* is the compound_head field, and this field is the same for all tail pages.
24-
*
25-
* By removing redundant page structs for HugeTLB pages, memory can be returned
26-
* to the buddy allocator for other uses.
27-
*
28-
* Different architectures support different HugeTLB pages. For example, the
29-
* following table is the HugeTLB page size supported by x86 and arm64
30-
* architectures. Because arm64 supports 4k, 16k, and 64k base pages and
31-
* supports contiguous entries, so it supports many kinds of sizes of HugeTLB
32-
* page.
33-
*
34-
* +--------------+-----------+-----------------------------------------------+
35-
* | Architecture | Page Size | HugeTLB Page Size |
36-
* +--------------+-----------+-----------+-----------+-----------+-----------+
37-
* | x86-64 | 4KB | 2MB | 1GB | | |
38-
* +--------------+-----------+-----------+-----------+-----------+-----------+
39-
* | | 4KB | 64KB | 2MB | 32MB | 1GB |
40-
* | +-----------+-----------+-----------+-----------+-----------+
41-
* | arm64 | 16KB | 2MB | 32MB | 1GB | |
42-
* | +-----------+-----------+-----------+-----------+-----------+
43-
* | | 64KB | 2MB | 512MB | 16GB | |
44-
* +--------------+-----------+-----------+-----------+-----------+-----------+
45-
*
46-
* When the system boot up, every HugeTLB page has more than one struct page
47-
* structs which size is (unit: pages):
48-
*
49-
* struct_size = HugeTLB_Size / PAGE_SIZE * sizeof(struct page) / PAGE_SIZE
50-
*
51-
* Where HugeTLB_Size is the size of the HugeTLB page. We know that the size
52-
* of the HugeTLB page is always n times PAGE_SIZE. So we can get the following
53-
* relationship.
54-
*
55-
* HugeTLB_Size = n * PAGE_SIZE
56-
*
57-
* Then,
58-
*
59-
* struct_size = n * PAGE_SIZE / PAGE_SIZE * sizeof(struct page) / PAGE_SIZE
60-
* = n * sizeof(struct page) / PAGE_SIZE
61-
*
62-
* We can use huge mapping at the pud/pmd level for the HugeTLB page.
63-
*
64-
* For the HugeTLB page of the pmd level mapping, then
65-
*
66-
* struct_size = n * sizeof(struct page) / PAGE_SIZE
67-
* = PAGE_SIZE / sizeof(pte_t) * sizeof(struct page) / PAGE_SIZE
68-
* = sizeof(struct page) / sizeof(pte_t)
69-
* = 64 / 8
70-
* = 8 (pages)
71-
*
72-
* Where n is how many pte entries which one page can contains. So the value of
73-
* n is (PAGE_SIZE / sizeof(pte_t)).
74-
*
75-
* This optimization only supports 64-bit system, so the value of sizeof(pte_t)
76-
* is 8. And this optimization also applicable only when the size of struct page
77-
* is a power of two. In most cases, the size of struct page is 64 bytes (e.g.
78-
* x86-64 and arm64). So if we use pmd level mapping for a HugeTLB page, the
79-
* size of struct page structs of it is 8 page frames which size depends on the
80-
* size of the base page.
81-
*
82-
* For the HugeTLB page of the pud level mapping, then
83-
*
84-
* struct_size = PAGE_SIZE / sizeof(pmd_t) * struct_size(pmd)
85-
* = PAGE_SIZE / 8 * 8 (pages)
86-
* = PAGE_SIZE (pages)
87-
*
88-
* Where the struct_size(pmd) is the size of the struct page structs of a
89-
* HugeTLB page of the pmd level mapping.
90-
*
91-
* E.g.: A 2MB HugeTLB page on x86_64 consists in 8 page frames while 1GB
92-
* HugeTLB page consists in 4096.
93-
*
94-
* Next, we take the pmd level mapping of the HugeTLB page as an example to
95-
* show the internal implementation of this optimization. There are 8 pages
96-
* struct page structs associated with a HugeTLB page which is pmd mapped.
97-
*
98-
* Here is how things look before optimization.
99-
*
100-
* HugeTLB struct pages(8 pages) page frame(8 pages)
101-
* +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+
102-
* | | | 0 | -------------> | 0 |
103-
* | | +-----------+ +-----------+
104-
* | | | 1 | -------------> | 1 |
105-
* | | +-----------+ +-----------+
106-
* | | | 2 | -------------> | 2 |
107-
* | | +-----------+ +-----------+
108-
* | | | 3 | -------------> | 3 |
109-
* | | +-----------+ +-----------+
110-
* | | | 4 | -------------> | 4 |
111-
* | PMD | +-----------+ +-----------+
112-
* | level | | 5 | -------------> | 5 |
113-
* | mapping | +-----------+ +-----------+
114-
* | | | 6 | -------------> | 6 |
115-
* | | +-----------+ +-----------+
116-
* | | | 7 | -------------> | 7 |
117-
* | | +-----------+ +-----------+
118-
* | |
119-
* | |
120-
* | |
121-
* +-----------+
122-
*
123-
* The value of page->compound_head is the same for all tail pages. The first
124-
* page of page structs (page 0) associated with the HugeTLB page contains the 4
125-
* page structs necessary to describe the HugeTLB. The only use of the remaining
126-
* pages of page structs (page 1 to page 7) is to point to page->compound_head.
127-
* Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of page structs
128-
* will be used for each HugeTLB page. This will allow us to free the remaining
129-
* 7 pages to the buddy allocator.
130-
*
131-
* Here is how things look after remapping.
132-
*
133-
* HugeTLB struct pages(8 pages) page frame(8 pages)
134-
* +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+
135-
* | | | 0 | -------------> | 0 |
136-
* | | +-----------+ +-----------+
137-
* | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^
138-
* | | +-----------+ | | | | | |
139-
* | | | 2 | -----------------+ | | | | |
140-
* | | +-----------+ | | | | |
141-
* | | | 3 | -------------------+ | | | |
142-
* | | +-----------+ | | | |
143-
* | | | 4 | ---------------------+ | | |
144-
* | PMD | +-----------+ | | |
145-
* | level | | 5 | -----------------------+ | |
146-
* | mapping | +-----------+ | |
147-
* | | | 6 | -------------------------+ |
148-
* | | +-----------+ |
149-
* | | | 7 | ---------------------------+
150-
* | | +-----------+
151-
* | |
152-
* | |
153-
* | |
154-
* +-----------+
155-
*
156-
* When a HugeTLB is freed to the buddy system, we should allocate 7 pages for
157-
* vmemmap pages and restore the previous mapping relationship.
158-
*
159-
* For the HugeTLB page of the pud level mapping. It is similar to the former.
160-
* We also can use this approach to free (PAGE_SIZE - 1) vmemmap pages.
161-
*
162-
* Apart from the HugeTLB page of the pmd/pud level mapping, some architectures
163-
* (e.g. aarch64) provides a contiguous bit in the translation table entries
164-
* that hints to the MMU to indicate that it is one of a contiguous set of
165-
* entries that can be cached in a single TLB entry.
166-
*
167-
* The contiguous bit is used to increase the mapping size at the pmd and pte
168-
* (last) level. So this type of HugeTLB page can be optimized only when its
169-
* size of the struct page structs is greater than 1 page.
170-
*
171-
* Notice: The head vmemmap page is not freed to the buddy allocator and all
172-
* tail vmemmap pages are mapped to the head vmemmap page frame. So we can see
173-
* more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page)
174-
* associated with each HugeTLB page. The compound_head() can handle this
175-
* correctly (more details refer to the comment above compound_head()).
9+
* See Documentation/vm/vmemmap_dedup.rst
17610
*/
17711
#define pr_fmt(fmt) "HugeTLB: " fmt
17812

0 commit comments

Comments
 (0)