Skip to content

Commit e3246d8

Browse files
jpemartinsakpm00
authored andcommitted
mm/sparse-vmemmap: add a pgmap argument to section activation
Patch series "sparse-vmemmap: memory savings for compound devmaps (device-dax)", v9. This series minimizes 'struct page' overhead by pursuing a similar approach as Muchun Song series "Free some vmemmap pages of hugetlb page" (now merged since v5.14), but applied to devmap with @vmemmap_shift (device-dax). The vmemmap dedpulication original idea (already used in HugeTLB) is to reuse/deduplicate tail page vmemmap areas, particular the area which only describes tail pages. So a vmemmap page describes 64 struct pages, and the first page for a given ZONE_DEVICE vmemmap would contain the head page and 63 tail pages. The second vmemmap page would contain only tail pages, and that's what gets reused across the rest of the subsection/section. The bigger the page size, the bigger the savings (2M hpage -> save 6 vmemmap pages; 1G hpage -> save 4094 vmemmap pages). This is done for PMEM /specifically only/ on device-dax configured namespaces, not fsdax. In other words, a devmap with a @vmemmap_shift. In terms of savings, per 1Tb of memory, the struct page cost would go down with compound devmap: * with 2M pages we lose 4G instead of 16G (0.39% instead of 1.5% of total memory) * with 1G pages we lose 40MB instead of 16G (0.0014% instead of 1.5% of total memory) The series is mostly summed up by patch 4, and to summarize what the series does: Patches 1 - 3: Minor cleanups in preparation for patch 4. Move the very nice docs of hugetlb_vmemmap.c into a Documentation/vm/ entry. Patch 4: Patch 4 is the one that takes care of the struct page savings (also referred to here as tail-page/vmemmap deduplication). Much like Muchun series, we reuse the second PTE tail page vmemmap areas across a given @vmemmap_shift On important difference though, is that contrary to the hugetlbfs series, there's no vmemmap for the area because we are late-populating it as opposed to remapping a system-ram range. IOW no freeing of pages of already initialized vmemmap like the case for hugetlbfs, which greatly simplifies the logic (besides not being arch-specific). altmap case unchanged and still goes via the vmemmap_populate(). Also adjust the newly added docs to the device-dax case. [Note that device-dax is still a little behind HugeTLB in terms of savings. I have an additional simple patch that reuses the head vmemmap page too, as a follow-up. That will double the savings and namespaces initialization.] Patch 5: Initialize fewer struct pages depending on the page size with DRAM backed struct pages -- because fewer pages are unique and most tail pages (with bigger vmemmap_shift). NVDIMM namespace bootstrap improves from ~268-358 ms to ~80-110/<1ms on 128G NVDIMMs with 2M and 1G respectivally. And struct page needed capacity will be 3.8x / 1071x smaller for 2M and 1G respectivelly. Tested on x86 with 1.5Tb of pmem (including pinning, and RDMA registration/deregistration scalability with 2M MRs) This patch (of 5): In support of using compound pages for devmap mappings, plumb the pgmap down to the vmemmap_populate implementation. Note that while altmap is retrievable from pgmap the memory hotplug code passes altmap without pgmap[*], so both need to be independently plumbed. So in addition to @altmap, pass @pgmap to sparse section populate functions namely: sparse_add_section section_activate populate_section_memmap __populate_section_memmap Passing @pgmap allows __populate_section_memmap() to both fetch the vmemmap_shift in which memmap metadata is created for and also to let sparse-vmemmap fetch pgmap ranges to co-relate to a given section and pick whether to just reuse tail pages from past onlined sections. While at it, fix the kdoc for @altmap for sparse_add_section(). [*] https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Reviewed-by: Dan Williams <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jane Chu <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 47010c0 commit e3246d8

File tree

5 files changed

+26
-14
lines changed

5 files changed

+26
-14
lines changed

include/linux/memory_hotplug.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ struct memory_block;
1515
struct memory_group;
1616
struct resource;
1717
struct vmem_altmap;
18+
struct dev_pagemap;
1819

1920
#ifdef CONFIG_HAVE_ARCH_NODEDATA_EXTENSION
2021
/*
@@ -122,6 +123,7 @@ typedef int __bitwise mhp_t;
122123
struct mhp_params {
123124
struct vmem_altmap *altmap;
124125
pgprot_t pgprot;
126+
struct dev_pagemap *pgmap;
125127
};
126128

127129
bool mhp_range_allowed(u64 start, u64 size, bool need_mapping);
@@ -333,7 +335,8 @@ extern void remove_pfn_range_from_zone(struct zone *zone,
333335
unsigned long nr_pages);
334336
extern bool is_memblock_offlined(struct memory_block *mem);
335337
extern int sparse_add_section(int nid, unsigned long pfn,
336-
unsigned long nr_pages, struct vmem_altmap *altmap);
338+
unsigned long nr_pages, struct vmem_altmap *altmap,
339+
struct dev_pagemap *pgmap);
337340
extern void sparse_remove_section(struct mem_section *ms,
338341
unsigned long pfn, unsigned long nr_pages,
339342
unsigned long map_offset, struct vmem_altmap *altmap);

include/linux/mm.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3154,7 +3154,8 @@ int vmemmap_remap_alloc(unsigned long start, unsigned long end,
31543154

31553155
void *sparse_buffer_alloc(unsigned long size);
31563156
struct page * __populate_section_memmap(unsigned long pfn,
3157-
unsigned long nr_pages, int nid, struct vmem_altmap *altmap);
3157+
unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
3158+
struct dev_pagemap *pgmap);
31583159
pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
31593160
p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node);
31603161
pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);

mm/memory_hotplug.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -328,7 +328,8 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
328328
/* Select all remaining pages up to the next section boundary */
329329
cur_nr_pages = min(end_pfn - pfn,
330330
SECTION_ALIGN_UP(pfn + 1) - pfn);
331-
err = sparse_add_section(nid, pfn, cur_nr_pages, altmap);
331+
err = sparse_add_section(nid, pfn, cur_nr_pages, altmap,
332+
params->pgmap);
332333
if (err)
333334
break;
334335
cond_resched();

mm/sparse-vmemmap.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -641,7 +641,8 @@ int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end,
641641
}
642642

643643
struct page * __meminit __populate_section_memmap(unsigned long pfn,
644-
unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
644+
unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
645+
struct dev_pagemap *pgmap)
645646
{
646647
unsigned long start = (unsigned long) pfn_to_page(pfn);
647648
unsigned long end = start + nr_pages * sizeof(struct page);

mm/sparse.c

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -427,7 +427,8 @@ static unsigned long __init section_map_size(void)
427427
}
428428

429429
struct page __init *__populate_section_memmap(unsigned long pfn,
430-
unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
430+
unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
431+
struct dev_pagemap *pgmap)
431432
{
432433
unsigned long size = section_map_size();
433434
struct page *map = sparse_buffer_alloc(size);
@@ -524,7 +525,7 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
524525
break;
525526

526527
map = __populate_section_memmap(pfn, PAGES_PER_SECTION,
527-
nid, NULL);
528+
nid, NULL, NULL);
528529
if (!map) {
529530
pr_err("%s: node[%d] memory map backing failed. Some memory will not be available.",
530531
__func__, nid);
@@ -629,9 +630,10 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
629630

630631
#ifdef CONFIG_SPARSEMEM_VMEMMAP
631632
static struct page * __meminit populate_section_memmap(unsigned long pfn,
632-
unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
633+
unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
634+
struct dev_pagemap *pgmap)
633635
{
634-
return __populate_section_memmap(pfn, nr_pages, nid, altmap);
636+
return __populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
635637
}
636638

637639
static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
@@ -700,7 +702,8 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
700702
}
701703
#else
702704
struct page * __meminit populate_section_memmap(unsigned long pfn,
703-
unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
705+
unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
706+
struct dev_pagemap *pgmap)
704707
{
705708
return kvmalloc_node(array_size(sizeof(struct page),
706709
PAGES_PER_SECTION), GFP_KERNEL, nid);
@@ -823,7 +826,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
823826
}
824827

825828
static struct page * __meminit section_activate(int nid, unsigned long pfn,
826-
unsigned long nr_pages, struct vmem_altmap *altmap)
829+
unsigned long nr_pages, struct vmem_altmap *altmap,
830+
struct dev_pagemap *pgmap)
827831
{
828832
struct mem_section *ms = __pfn_to_section(pfn);
829833
struct mem_section_usage *usage = NULL;
@@ -855,7 +859,7 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
855859
if (nr_pages < PAGES_PER_SECTION && early_section(ms))
856860
return pfn_to_page(pfn);
857861

858-
memmap = populate_section_memmap(pfn, nr_pages, nid, altmap);
862+
memmap = populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
859863
if (!memmap) {
860864
section_deactivate(pfn, nr_pages, altmap);
861865
return ERR_PTR(-ENOMEM);
@@ -869,7 +873,8 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
869873
* @nid: The node to add section on
870874
* @start_pfn: start pfn of the memory range
871875
* @nr_pages: number of pfns to add in the section
872-
* @altmap: device page map
876+
* @altmap: alternate pfns to allocate the memmap backing store
877+
* @pgmap: alternate compound page geometry for devmap mappings
873878
*
874879
* This is only intended for hotplug.
875880
*
@@ -883,7 +888,8 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
883888
* * -ENOMEM - Out of memory.
884889
*/
885890
int __meminit sparse_add_section(int nid, unsigned long start_pfn,
886-
unsigned long nr_pages, struct vmem_altmap *altmap)
891+
unsigned long nr_pages, struct vmem_altmap *altmap,
892+
struct dev_pagemap *pgmap)
887893
{
888894
unsigned long section_nr = pfn_to_section_nr(start_pfn);
889895
struct mem_section *ms;
@@ -894,7 +900,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
894900
if (ret < 0)
895901
return ret;
896902

897-
memmap = section_activate(nid, start_pfn, nr_pages, altmap);
903+
memmap = section_activate(nid, start_pfn, nr_pages, altmap, pgmap);
898904
if (IS_ERR(memmap))
899905
return PTR_ERR(memmap);
900906

0 commit comments

Comments
 (0)