Skip to content

Conversation

@pfalcon
Copy link
Contributor

@pfalcon pfalcon commented Nov 20, 2018

This reverts commit a0df4f6.

This commit has the following description:

"""
Some of the sanitycheck tests were having too small limit for
network buffers when compiling for sam_e70_xplained board.
Increase the buffer limits when testing this for this board.
"""

But the actual code changes do not correspond to this description:
instead of making changes just for the affected sam_e70_xplained,
actually the defaults for all Ethernet boards are changed.

This has negative impact on BOARD=frdm_k64f. Specifically, without
a0df4f6, using samples/net/sockets/dumb_http_server and
ApacheBench's "ab -n1000 http://192.0.2.1:8080/" (i.e. load-test
the Zephyr IP stack with serving 1000 consecutive connections),
everything works as expected (excerpts from the ab report):

Time taken for tests: 8.171 seconds
Complete requests: 1000
Percentage of the requests served within a certain time (ms)
50% 8
66% 8
75% 8
80% 8
90% 8
95% 8
98% 8
99% 9
100% 21 (longest request)

However, with a0df4f6 in effect, running the same command serves
10-100 requests OK, but then the visible slowdown starts. After
breaking ab after 30s (otherwise, it could take hour(s) to finish),
the result is:

Time taken for tests: 35.449 seconds
Complete requests: 51
Percentage of the requests served within a certain time (ms)
50% 8
66% 1216
75% 1312
80% 1312
90% 1472
95% 1984
98% 2560
99% 3776
100% 3776 (longest request)

Thus, revert a0df4f6, as generally an Ethernet board works OK with
the settings as were before. Suggestions regarding sam_e70_xplained:

  1. Try to anylize why its behavior is different from e.g. frdm_k64f.
    (Perhaps, the matter is not just one board vs another board, but
    one sample vs another. Different samples should be tested (including
    samples/net/sockets/), and only affected should have config changed.)
  2. If truly needed, sam_e70_xplained-specific settings should go in
    its specific config(s).

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 20, 2018

This is resubmit for abruptly closed #7831 . This latter ticket contains wealth of discussion regarding this patch and why it is needed (just like the commit message does).

@codecov-io
Copy link

codecov-io commented Nov 20, 2018

Codecov Report

Merging #11530 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #11530   +/-   ##
=======================================
  Coverage   48.42%   48.42%           
=======================================
  Files         270      270           
  Lines       42131    42131           
  Branches    10141    10141           
=======================================
  Hits        20402    20402           
  Misses      17647    17647           
  Partials     4082     4082

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a618ea2...0192ea0. Read the comment docs.

@tbursztyka
Copy link
Contributor

If you revert, aren't re-enabling the issue that the commit was fixing?
Could you propose a second patch, fixing the original issue properly then?

This reverts commit a0df4f6.

This commit has the following description:

"""
Some of the sanitycheck tests were having too small limit for
network buffers when compiling for sam_e70_xplained board.
Increase the buffer limits when testing this for this board.
"""

But the actual code changes do not correspond to this description:
instead of making changes just for the affected sam_e70_xplained,
actually the defaults for all Ethernet boards are changed.

This has negative impact on BOARD=frdm_k64f. Specifically, without
a0df4f6, using samples/net/sockets/dumb_http_server and
ApacheBench's "ab -n1000 http://192.0.2.1:8080/" (i.e. load-test
the Zephyr IP stack with serving 1000 consecutive connections),
everything works as expected (excerpts from the ab report):

Time taken for tests:   8.171 seconds
Complete requests:      1000
Percentage of the requests served within a certain time (ms)
  50%      8
  66%      8
  75%      8
  80%      8
  90%      8
  95%      8
  98%      8
  99%      9
 100%     21 (longest request)

However, with a0df4f6 in effect, running the same command serves
10-100 requests OK, but then the visible slowdown starts. After
breaking ab after 30s (otherwise, it could take hour(s) to finish),
the result is:

Time taken for tests:   35.449 seconds
Complete requests:      51
Percentage of the requests served within a certain time (ms)
  50%      8
  66%   1216
  75%   1312
  80%   1312
  90%   1472
  95%   1984
  98%   2560
  99%   3776
 100%   3776 (longest request)

Thus, revert a0df4f6, as generally an Ethernet board works OK with
the settings as were before. Suggestions regarding sam_e70_xplained:
1. Try to anylize why its behavior is different from e.g. frdm_k64f.
(Perhaps, the matter is not just one board vs another board, but
one sample vs another. Different samples should be tested (including
samples/net/sockets/), and only affected should have config changed.)
2. If truly needed, sam_e70_xplained-specific settings should go in
its specific config(s).
@pfalcon pfalcon force-pushed the net-revert-bufferbloat branch from 1d27090 to 0192ea0 Compare November 21, 2018 17:59
@zephyrbot
Copy link

Found the following issues, please fix and resubmit:

commit message syntax issues

1: UC2 Body does not contain a 'Signed-Off-By' line

identity/email issues

0192ea0: author email (Paul Sokolovsky [email protected]) needs to match one of the signed-off-by entries.

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 23, 2018

If you revert, aren't re-enabling the issue that the commit was fixing?

Yes, the whole situation is explained in the commit message - a0df4f6, by enabled sanitycheck on sam_e70_xplained, broke networking on other boards. Thus, it was improper fix, and this patch reverts it.

Could you propose a second patch, fixing the original issue properly then?

Generally I can't, because I don't have sam_e70_xplained, so wouldn't be able to test the fix. The situation is discussed in #9015, where I ask @mnkp, the maintainer of sam_e70_xplained, to do something about it (in a tested way). At least 5 solutions for a choose is proposed to him. So far, little progress was achieved.

This is a prototype of one of the solutions, a backup one - to blacklist sam_e70_xplained from sanitycheck. It's a sad solution, so I don't haste with the actual blacklisting (not part of this PR), but this PR is intended to remind all parties involved - @mnkp, network maintainers, project maintainers, that we have such an unpleasant situation, which needs to be resolved.

(Actually, I'd say, we need guidelines how to resolve such kind of situations in general, a lot of discussion in #9015 approaches it from that PoV.)

@pfalcon
Copy link
Contributor Author

pfalcon commented Jan 18, 2019

In the wake of #12583, I'm closing this, as this just worked around the issue now fixed on eth_mcux side.

Note that the matter this patch raises is still relevant: we'll never have a well-working netstack if drivers will dictate system parameters. It should be the opposite: system sets params, and drivers accommodate them.

@pfalcon pfalcon closed this Jan 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants