Batching crashtest #310

hhsecond · 2020-03-25T08:12:08Z

Test case for testing the batching crash

lantiga · 2020-03-25T10:04:50Z

lantiga · 2020-03-25T10:05:15Z

I'm wondering why the non-gpu build is not crashing. Can you confirm this is the case?

hhsecond · 2020-03-25T10:37:44Z

Confirming that It's crashing on CPU. Tests probably misbehaving because of the timings. I have added a thread join for syncing the execution flow

codecov · 2020-03-25T10:41:54Z

Codecov Report

Merging #310 into batching will increase coverage by 0.26%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##           batching     #310      +/-   ##
============================================
+ Coverage     55.52%   55.79%   +0.26%     
============================================
  Files            25       25              
  Lines          5021     5022       +1     
============================================
+ Hits           2788     2802      +14     
+ Misses         2233     2220      -13

Impacted Files	Coverage Δ
src/redisai.c	`76.57% <100.00%> (+1.15%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9ab3cec...b9f7c88. Read the comment docs.

hhsecond · 2020-03-25T10:48:34Z

Ok, that's just weird. All the tests are passing now

hhsecond · 2020-03-25T11:36:42Z

Ok, so if you take to look at the logs, the server is still crashing. But the test conditions are met because it tries to fetch the result from a key that was filled from a previous operation. I have changed the test. It should fail now. But something still keeps me thinking is, how'd the crashed server auto-healed for the rest of the test cases in the pipeline. Would you have any idea @lantiga ?

…ending command.

lantiga · 2020-03-28T01:31:48Z

@hhsecond I think I got it: queueEvict had a big bug in it.
Great job with the repro BTW, it would have been really hard to spot the bug without it.

Note: I had to port the test to multiprocessing so I can now kill the pending MODELRUN process. Otherwise the test passes but we get an exception afterwards, because redis-server is terminated by the runner but that request is still pending.

* Add support for automated batching Add support for inspection and eviction to queue Mock run info batching Mock run info batching Make TF tests work Add batching for ONNX and ONNX-ML Fix torch API, still WIP Fix torch backend Fixes after rebasing Add auto-batching to TFLite backend Fix from rebase Add batching args to command and change API accordingly Add batching heuristics [WIP] Fix TFLite test by accessing first tensor in first batch safely Temporarily comment out wrong_bg test check Implement batching heuristics Introduce autobatch tests, tflite still fails Fix segfault when error was generated from the backend Fix tflite autobatch test Updated documentation with auto batching Remove stale comments Avoid making extra copies of inputs and outputs when batch count is 1 Address review comments re const-correctness Add tests to detect failures Fix slicing and concatenation Fix tensor slicing and concatenating Temporarily disable tflite autobatch test due to tflite limitation Disable support for autobatching for TFLITE * Fix TFLite and tests after rebase * Temporarily disable macos CI build * Add synchronization to autobatch tests * Add synchronization to autobatch thread * Add synchronization to autobatch thread * Batching crashtest (#310) * test cases for crash test * Fix issue with evict. Port test to multiprocessing to allow killing pending command. * Use terminate instead of kill Co-authored-by: Luca Antiga <[email protected]> Co-authored-by: Sherin Thomas <[email protected]>

Add support for batching (take two) (#270) * Add support for automated batching Add support for inspection and eviction to queue Mock run info batching Mock run info batching Make TF tests work Add batching for ONNX and ONNX-ML Fix torch API, still WIP Fix torch backend Fixes after rebasing Add auto-batching to TFLite backend Fix from rebase Add batching args to command and change API accordingly Add batching heuristics [WIP] Fix TFLite test by accessing first tensor in first batch safely Temporarily comment out wrong_bg test check Implement batching heuristics Introduce autobatch tests, tflite still fails Fix segfault when error was generated from the backend Fix tflite autobatch test Updated documentation with auto batching Remove stale comments Avoid making extra copies of inputs and outputs when batch count is 1 Address review comments re const-correctness Add tests to detect failures Fix slicing and concatenation Fix tensor slicing and concatenating Temporarily disable tflite autobatch test due to tflite limitation Disable support for autobatching for TFLITE * Fix TFLite and tests after rebase * Temporarily disable macos CI build * Add synchronization to autobatch tests * Add synchronization to autobatch thread * Add synchronization to autobatch thread * Batching crashtest (#310) * test cases for crash test * Fix issue with evict. Port test to multiprocessing to allow killing pending command. * Use terminate instead of kill Co-authored-by: Luca Antiga <[email protected]> Co-authored-by: Sherin Thomas <[email protected]>

hhsecond requested a review from lantiga March 25, 2020 08:12

hhsecond changed the base branch from master to batching March 25, 2020 08:12

hhsecond changed the title ~~Branching crashtest~~ Batching crashtest Mar 25, 2020

hhsecond force-pushed the branching_crashtest branch from b60dfe7 to 918b81b Compare March 25, 2020 08:17

hhsecond force-pushed the branching_crashtest branch from 918b81b to e758f34 Compare March 25, 2020 10:34

test cases for crash test

21883bd

hhsecond force-pushed the branching_crashtest branch from e758f34 to 21883bd Compare March 25, 2020 11:33

Fix issue with evict. Port test to multiprocessing to allow killing p…

7ac7976

…ending command.

Use terminate instead of kill

b9f7c88

lantiga merged commit d468198 into batching Mar 28, 2020

hhsecond deleted the branching_crashtest branch March 28, 2020 01:53

lantiga mentioned this pull request Mar 28, 2020

Add support for batching (take two) #270

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batching crashtest #310

Batching crashtest #310

Uh oh!

hhsecond commented Mar 25, 2020 •

edited

Loading

Uh oh!

lantiga commented Mar 25, 2020

Uh oh!

lantiga commented Mar 25, 2020

Uh oh!

hhsecond commented Mar 25, 2020 •

edited

Loading

Uh oh!

codecov bot commented Mar 25, 2020 •

edited

Loading

Uh oh!

hhsecond commented Mar 25, 2020

Uh oh!

hhsecond commented Mar 25, 2020

Uh oh!

lantiga commented Mar 28, 2020

Uh oh!

Uh oh!

Batching crashtest #310

Batching crashtest #310

Uh oh!

Conversation

hhsecond commented Mar 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lantiga commented Mar 25, 2020

Uh oh!

lantiga commented Mar 25, 2020

Uh oh!

hhsecond commented Mar 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hhsecond commented Mar 25, 2020

Uh oh!

hhsecond commented Mar 25, 2020

Uh oh!

lantiga commented Mar 28, 2020

Uh oh!

Uh oh!

hhsecond commented Mar 25, 2020 •

edited

Loading

hhsecond commented Mar 25, 2020 •

edited

Loading

codecov bot commented Mar 25, 2020 •

edited

Loading