Skip to content

Commit 4b48221

Browse files
committed
Merge: dm: sync with upstream 6.6 fixes and improvements
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3199 JIRA: https://issues.redhat.com/browse/RHEL-12342 JIRA: https://issues.redhat.com/browse/RHEL-8220 JIRA: https://issues.redhat.com/browse/RHEL-12435 Tested: lvm2-testsuite and dmtest Upstream Status: kernel/git/torvalds/linux.git Pull in various dm changes from upstream. Signed-off-by: Benjamin Marzinski <[email protected]> Approved-by: Heinz Mauelshagen <[email protected]> Approved-by: Mike Snitzer <[email protected]> Approved-by: Mikuláš Patočka <[email protected]> Approved-by: Nigel Croxon <[email protected]> Signed-off-by: Scott Weaver <[email protected]>
2 parents 72ad724 + 3b99945 commit 4b48221

23 files changed

+541
-287
lines changed

Documentation/admin-guide/device-mapper/dm-flakey.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,16 @@ Optional feature parameters:
6767
Perform the replacement only if bio->bi_opf has all the
6868
selected flags set.
6969

70+
random_read_corrupt <probability>
71+
During <down interval>, replace random byte in a read bio
72+
with a random value. probability is an integer between
73+
0 and 1000000000 meaning 0% to 100% probability of corruption.
74+
75+
random_write_corrupt <probability>
76+
During <down interval>, replace random byte in a write bio
77+
with a random value. probability is an integer between
78+
0 and 1000000000 meaning 0% to 100% probability of corruption.
79+
7080
Examples:
7181

7282
Replaces the 32nd byte of READ bios with the value 1::

Documentation/admin-guide/device-mapper/dm-integrity.rst

Lines changed: 27 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ mode it calculates and verifies the integrity tag internally. In this
2525
mode, the dm-integrity target can be used to detect silent data
2626
corruption on the disk or in the I/O path.
2727

28-
There's an alternate mode of operation where dm-integrity uses bitmap
28+
There's an alternate mode of operation where dm-integrity uses a bitmap
2929
instead of a journal. If a bit in the bitmap is 1, the corresponding
3030
region's data and integrity tags are not synchronized - if the machine
3131
crashes, the unsynchronized regions will be recalculated. The bitmap mode
@@ -38,6 +38,15 @@ the device. But it will only format the device if the superblock contains
3838
zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
3939
target can't be loaded.
4040

41+
Accesses to the on-disk metadata area containing checksums (aka tags) are
42+
buffered using dm-bufio. When an access to any given metadata area
43+
occurs, each unique metadata area gets its own buffer(s). The buffer size
44+
is capped at the size of the metadata area, but may be smaller, thereby
45+
requiring multiple buffers to represent the full metadata area. A smaller
46+
buffer size will produce a smaller resulting read/write operation to the
47+
metadata area for small reads/writes. The metadata is still read even in
48+
a full write to the data covered by a single buffer.
49+
4150
To use the target for the first time:
4251

4352
1. overwrite the superblock with zeroes
@@ -93,7 +102,7 @@ journal_sectors:number
93102
device. If the device is already formatted, the value from the
94103
superblock is used.
95104

96-
interleave_sectors:number
105+
interleave_sectors:number (default 32768)
97106
The number of interleaved sectors. This values is rounded down to
98107
a power of two. If the device is already formatted, the value from
99108
the superblock is used.
@@ -102,20 +111,16 @@ meta_device:device
102111
Don't interleave the data and metadata on the device. Use a
103112
separate device for metadata.
104113

105-
buffer_sectors:number
106-
The number of sectors in one buffer. The value is rounded down to
107-
a power of two.
108-
109-
The tag area is accessed using buffers, the buffer size is
110-
configurable. The large buffer size means that the I/O size will
111-
be larger, but there could be less I/Os issued.
114+
buffer_sectors:number (default 128)
115+
The number of sectors in one metadata buffer. The value is rounded
116+
down to a power of two.
112117

113-
journal_watermark:number
118+
journal_watermark:number (default 50)
114119
The journal watermark in percents. When the size of the journal
115120
exceeds this watermark, the thread that flushes the journal will
116121
be started.
117122

118-
commit_time:number
123+
commit_time:number (default 10000)
119124
Commit time in milliseconds. When this time passes, the journal is
120125
written. The journal is also written immediately if the FLUSH
121126
request is received.
@@ -163,11 +168,10 @@ journal_mac:algorithm(:key) (the key is optional)
163168
the journal. Thus, modified sector number would be detected at
164169
this stage.
165170

166-
block_size:number
167-
The size of a data block in bytes. The larger the block size the
171+
block_size:number (default 512)
172+
The size of a data block in bytes. The larger the block size the
168173
less overhead there is for per-block integrity metadata.
169-
Supported values are 512, 1024, 2048 and 4096 bytes. If not
170-
specified the default block size is 512 bytes.
174+
Supported values are 512, 1024, 2048 and 4096 bytes.
171175

172176
sectors_per_bit:number
173177
In the bitmap mode, this parameter specifies the number of
@@ -209,6 +213,12 @@ table and swap the tables with suspend and resume). The other arguments
209213
should not be changed when reloading the target because the layout of disk
210214
data depend on them and the reloaded target would be non-functional.
211215

216+
For example, on a device using the default interleave_sectors of 32768, a
217+
block_size of 512, and an internal_hash of crc32c with a tag size of 4
218+
bytes, it will take 128 KiB of tags to track a full data area, requiring
219+
256 sectors of metadata per data area. With the default buffer_sectors of
220+
128, that means there will be 2 buffers per metadata area, or 2 buffers
221+
per 16 MiB of data.
212222

213223
Status line:
214224

@@ -286,7 +296,8 @@ The layout of the formatted block device:
286296
Each run contains:
287297

288298
* tag area - it contains integrity tags. There is one tag for each
289-
sector in the data area
299+
sector in the data area. The size of this area is always 4KiB or
300+
greater.
290301
* data area - it contains data sectors. The number of data sectors
291302
in one run must be a power of two. log2 of this value is stored
292303
in the superblock.

MAINTAINERS

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5534,8 +5534,8 @@ F: include/linux/devm-helpers.h
55345534
DEVICE-MAPPER (LVM)
55355535
M: Alasdair Kergon <[email protected]>
55365536
M: Mike Snitzer <[email protected]>
5537-
M: dm-devel@redhat.com
5538-
L: dm-devel@redhat.com
5537+
M: dm-devel@lists.linux.dev
5538+
L: dm-devel@lists.linux.dev
55395539
S: Maintained
55405540
W: http://sources.redhat.com/dm
55415541
Q: http://patchwork.kernel.org/project/dm-devel/list/

drivers/md/dm-bufio.c

Lines changed: 7 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1157,23 +1157,6 @@ static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
11571157

11581158
*data_mode = DATA_MODE_VMALLOC;
11591159

1160-
/*
1161-
* __vmalloc allocates the data pages and auxiliary structures with
1162-
* gfp_flags that were specified, but pagetables are always allocated
1163-
* with GFP_KERNEL, no matter what was specified as gfp_mask.
1164-
*
1165-
* Consequently, we must set per-process flag PF_MEMALLOC_NOIO so that
1166-
* all allocations done by this process (including pagetables) are done
1167-
* as if GFP_NOIO was specified.
1168-
*/
1169-
if (gfp_mask & __GFP_NORETRY) {
1170-
unsigned int noio_flag = memalloc_noio_save();
1171-
void *ptr = __vmalloc(c->block_size, gfp_mask);
1172-
1173-
memalloc_noio_restore(noio_flag);
1174-
return ptr;
1175-
}
1176-
11771160
return __vmalloc(c->block_size, gfp_mask);
11781161
}
11791162

@@ -2592,6 +2575,13 @@ void dm_bufio_client_destroy(struct dm_bufio_client *c)
25922575
}
25932576
EXPORT_SYMBOL_GPL(dm_bufio_client_destroy);
25942577

2578+
void dm_bufio_client_reset(struct dm_bufio_client *c)
2579+
{
2580+
drop_buffers(c);
2581+
flush_work(&c->shrink_work);
2582+
}
2583+
EXPORT_SYMBOL_GPL(dm_bufio_client_reset);
2584+
25952585
void dm_bufio_set_sector_offset(struct dm_bufio_client *c, sector_t start)
25962586
{
25972587
c->start = start;

drivers/md/dm-core.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,7 @@ struct dm_table {
214214

215215
/* a list of devices used by this table */
216216
struct list_head devices;
217+
struct rw_semaphore devices_lock;
217218

218219
/* events get handed up using this callback */
219220
void (*event_fn)(void *data);
@@ -306,7 +307,8 @@ struct dm_io {
306307
*/
307308
enum {
308309
DM_IO_ACCOUNTED,
309-
DM_IO_WAS_SPLIT
310+
DM_IO_WAS_SPLIT,
311+
DM_IO_BLK_STAT
310312
};
311313

312314
static inline bool dm_io_flagged(struct dm_io *io, unsigned int bit)

drivers/md/dm-crypt.c

Lines changed: 45 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1661,15 +1661,18 @@ static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone);
16611661
* In order to not degrade performance with excessive locking, we try
16621662
* non-blocking allocations without a mutex first but on failure we fallback
16631663
* to blocking allocations with a mutex.
1664+
*
1665+
* In order to reduce allocation overhead, we try to allocate compound pages in
1666+
* the first pass. If they are not available, we fall back to the mempool.
16641667
*/
16651668
static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned int size)
16661669
{
16671670
struct crypt_config *cc = io->cc;
16681671
struct bio *clone;
16691672
unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
16701673
gfp_t gfp_mask = GFP_NOWAIT | __GFP_HIGHMEM;
1671-
unsigned int i, len, remaining_size;
1672-
struct page *page;
1674+
unsigned int remaining_size;
1675+
unsigned int order = MAX_ORDER - 1;
16731676

16741677
retry:
16751678
if (unlikely(gfp_mask & __GFP_DIRECT_RECLAIM))
@@ -1682,19 +1685,40 @@ static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned int size)
16821685

16831686
remaining_size = size;
16841687

1685-
for (i = 0; i < nr_iovecs; i++) {
1686-
page = mempool_alloc(&cc->page_pool, gfp_mask);
1687-
if (!page) {
1688+
while (remaining_size) {
1689+
struct page *pages;
1690+
unsigned size_to_add;
1691+
unsigned remaining_order = __fls((remaining_size + PAGE_SIZE - 1) >> PAGE_SHIFT);
1692+
order = min(order, remaining_order);
1693+
1694+
while (order > 0) {
1695+
if (unlikely(percpu_counter_read_positive(&cc->n_allocated_pages) +
1696+
(1 << order) > dm_crypt_pages_per_client))
1697+
goto decrease_order;
1698+
pages = alloc_pages(gfp_mask
1699+
| __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | __GFP_COMP,
1700+
order);
1701+
if (likely(pages != NULL)) {
1702+
percpu_counter_add(&cc->n_allocated_pages, 1 << order);
1703+
goto have_pages;
1704+
}
1705+
decrease_order:
1706+
order--;
1707+
}
1708+
1709+
pages = mempool_alloc(&cc->page_pool, gfp_mask);
1710+
if (!pages) {
16881711
crypt_free_buffer_pages(cc, clone);
16891712
bio_put(clone);
16901713
gfp_mask |= __GFP_DIRECT_RECLAIM;
1714+
order = 0;
16911715
goto retry;
16921716
}
16931717

1694-
len = (remaining_size > PAGE_SIZE) ? PAGE_SIZE : remaining_size;
1695-
1696-
__bio_add_page(clone, page, len, 0);
1697-
remaining_size -= len;
1718+
have_pages:
1719+
size_to_add = min((unsigned)PAGE_SIZE << order, remaining_size);
1720+
__bio_add_page(clone, pages, size_to_add, 0);
1721+
remaining_size -= size_to_add;
16981722
}
16991723

17001724
/* Allocate space for integrity tags */
@@ -1712,12 +1736,18 @@ static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned int size)
17121736

17131737
static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone)
17141738
{
1715-
struct bio_vec *bv;
1716-
struct bvec_iter_all iter_all;
1739+
struct folio_iter fi;
17171740

1718-
bio_for_each_segment_all(bv, clone, iter_all) {
1719-
BUG_ON(!bv->bv_page);
1720-
mempool_free(bv->bv_page, &cc->page_pool);
1741+
if (clone->bi_vcnt > 0) { /* bio_for_each_folio_all crashes with an empty bio */
1742+
bio_for_each_folio_all(fi, clone) {
1743+
if (folio_test_large(fi.folio)) {
1744+
percpu_counter_sub(&cc->n_allocated_pages,
1745+
1 << folio_order(fi.folio));
1746+
folio_put(fi.folio);
1747+
} else {
1748+
mempool_free(&fi.folio->page, &cc->page_pool);
1749+
}
1750+
}
17211751
}
17221752
}
17231753

@@ -2888,7 +2918,7 @@ static int crypt_ctr_cipher_new(struct dm_target *ti, char *cipher_in, char *key
28882918
ret = crypt_ctr_auth_cipher(cc, cipher_api);
28892919
if (ret < 0) {
28902920
ti->error = "Invalid AEAD cipher spec";
2891-
return -ENOMEM;
2921+
return ret;
28922922
}
28932923
}
28942924

0 commit comments

Comments
 (0)