@@ -8,61 +8,13 @@ both at leaf nodes as well as at intermediate nodes in a storage hierarchy.
88Plan is to use the same cgroup based management interface for blkio controller
99and based on user options switch IO policies in the background.
1010
11- Currently two IO control policies are implemented. First one is proportional
12- weight time based division of disk policy. It is implemented in CFQ. Hence
13- this policy takes effect only on leaf nodes when CFQ is being used. The second
14- one is throttling policy which can be used to specify upper IO rate limits
15- on devices. This policy is implemented in generic block layer and can be
16- used on leaf nodes as well as higher level logical devices like device mapper.
11+ One IO control policy is throttling policy which can be used to
12+ specify upper IO rate limits on devices. This policy is implemented in
13+ generic block layer and can be used on leaf nodes as well as higher
14+ level logical devices like device mapper.
1715
1816HOWTO
1917=====
20- Proportional Weight division of bandwidth
21- -----------------------------------------
22- You can do a very simple testing of running two dd threads in two different
23- cgroups. Here is what you can do.
24-
25- - Enable Block IO controller
26- CONFIG_BLK_CGROUP=y
27-
28- - Enable group scheduling in CFQ
29- CONFIG_CFQ_GROUP_IOSCHED=y
30-
31- - Compile and boot into kernel and mount IO controller (blkio); see
32- cgroups.txt, Why are cgroups needed?.
33-
34- mount -t tmpfs cgroup_root /sys/fs/cgroup
35- mkdir /sys/fs/cgroup/blkio
36- mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
37-
38- - Create two cgroups
39- mkdir -p /sys/fs/cgroup/blkio/test1/ /sys/fs/cgroup/blkio/test2
40-
41- - Set weights of group test1 and test2
42- echo 1000 > /sys/fs/cgroup/blkio/test1/blkio.weight
43- echo 500 > /sys/fs/cgroup/blkio/test2/blkio.weight
44-
45- - Create two same size files (say 512MB each) on same disk (file1, file2) and
46- launch two dd threads in different cgroup to read those files.
47-
48- sync
49- echo 3 > /proc/sys/vm/drop_caches
50-
51- dd if=/mnt/sdb/zerofile1 of=/dev/null &
52- echo $! > /sys/fs/cgroup/blkio/test1/tasks
53- cat /sys/fs/cgroup/blkio/test1/tasks
54-
55- dd if=/mnt/sdb/zerofile2 of=/dev/null &
56- echo $! > /sys/fs/cgroup/blkio/test2/tasks
57- cat /sys/fs/cgroup/blkio/test2/tasks
58-
59- - At macro level, first dd should finish first. To get more precise data, keep
60- on looking at (with the help of script), at blkio.disk_time and
61- blkio.disk_sectors files of both test1 and test2 groups. This will tell how
62- much disk time (in milliseconds), each group got and how many sectors each
63- group dispatched to the disk. We provide fairness in terms of disk time, so
64- ideally io.disk_time of cgroups should be in proportion to the weight.
65-
6618Throttling/Upper Limit policy
6719-----------------------------
6820- Enable Block IO controller
@@ -94,7 +46,7 @@ Throttling/Upper Limit policy
9446Hierarchical Cgroups
9547====================
9648
97- Both CFQ and throttling implement hierarchy support; however,
49+ Throttling implements hierarchy support; however,
9850throttling's hierarchy support is enabled iff "sane_behavior" is
9951enabled from cgroup side, which currently is a development option and
10052not publicly available.
@@ -107,9 +59,8 @@ If somebody created a hierarchy like as follows.
10759 |
10860 test3
10961
110- CFQ by default and throttling with "sane_behavior" will handle the
111- hierarchy correctly. For details on CFQ hierarchy support, refer to
112- Documentation/block/cfq-iosched.txt. For throttling, all limits apply
62+ Throttling with "sane_behavior" will handle the
63+ hierarchy correctly. For throttling, all limits apply
11364to the whole subtree while all statistics are local to the IOs
11465directly generated by tasks in that cgroup.
11566
@@ -130,10 +81,6 @@ CONFIG_DEBUG_BLK_CGROUP
13081 - Debug help. Right now some additional stats file show up in cgroup
13182 if this option is enabled.
13283
133- CONFIG_CFQ_GROUP_IOSCHED
134- - Enables group scheduling in CFQ. Currently only 1 level of group
135- creation is allowed.
136-
13784CONFIG_BLK_DEV_THROTTLING
13885 - Enable block device throttling support in block layer.
13986
@@ -344,32 +291,3 @@ Common files among various policies
344291- blkio.reset_stats
345292 - Writing an int to this file will result in resetting all the stats
346293 for that cgroup.
347-
348- CFQ sysfs tunable
349- =================
350- /sys/block/<disk>/queue/iosched/slice_idle
351- ------------------------------------------
352- On a faster hardware CFQ can be slow, especially with sequential workload.
353- This happens because CFQ idles on a single queue and single queue might not
354- drive deeper request queue depths to keep the storage busy. In such scenarios
355- one can try setting slice_idle=0 and that would switch CFQ to IOPS
356- (IO operations per second) mode on NCQ supporting hardware.
357-
358- That means CFQ will not idle between cfq queues of a cfq group and hence be
359- able to driver higher queue depth and achieve better throughput. That also
360- means that cfq provides fairness among groups in terms of IOPS and not in
361- terms of disk time.
362-
363- /sys/block/<disk>/queue/iosched/group_idle
364- ------------------------------------------
365- If one disables idling on individual cfq queues and cfq service trees by
366- setting slice_idle=0, group_idle kicks in. That means CFQ will still idle
367- on the group in an attempt to provide fairness among groups.
368-
369- By default group_idle is same as slice_idle and does not do anything if
370- slice_idle is enabled.
371-
372- One can experience an overall throughput drop if you have created multiple
373- groups and put applications in that group which are not driving enough
374- IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
375- on individual groups and throughput should improve.
0 commit comments