Skip to content

Conversation

hhsecond
Copy link

@hhsecond hhsecond commented Mar 25, 2020

Test case for testing the batching crash

@hhsecond hhsecond requested a review from lantiga March 25, 2020 08:12
@hhsecond hhsecond changed the base branch from master to batching March 25, 2020 08:12
@hhsecond hhsecond changed the title Branching crashtest Batching crashtest Mar 25, 2020
@hhsecond hhsecond force-pushed the branching_crashtest branch from b60dfe7 to 918b81b Compare March 25, 2020 08:17
@lantiga
Copy link
Contributor

lantiga commented Mar 25, 2020

Thank you @hhsecond !

@lantiga
Copy link
Contributor

lantiga commented Mar 25, 2020

I'm wondering why the non-gpu build is not crashing. Can you confirm this is the case?

@hhsecond hhsecond force-pushed the branching_crashtest branch from 918b81b to e758f34 Compare March 25, 2020 10:34
@hhsecond
Copy link
Author

hhsecond commented Mar 25, 2020

Confirming that It's crashing on CPU. Tests probably misbehaving because of the timings. I have added a thread join for syncing the execution flow

@codecov
Copy link

codecov bot commented Mar 25, 2020

Codecov Report

Merging #310 into batching will increase coverage by 0.26%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##           batching     #310      +/-   ##
============================================
+ Coverage     55.52%   55.79%   +0.26%     
============================================
  Files            25       25              
  Lines          5021     5022       +1     
============================================
+ Hits           2788     2802      +14     
+ Misses         2233     2220      -13     
Impacted Files Coverage Δ
src/redisai.c 76.57% <100.00%> (+1.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9ab3cec...b9f7c88. Read the comment docs.

@hhsecond
Copy link
Author

Ok, that's just weird. All the tests are passing now

@hhsecond hhsecond force-pushed the branching_crashtest branch from e758f34 to 21883bd Compare March 25, 2020 11:33
@hhsecond
Copy link
Author

Ok, so if you take to look at the logs, the server is still crashing. But the test conditions are met because it tries to fetch the result from a key that was filled from a previous operation. I have changed the test. It should fail now. But something still keeps me thinking is, how'd the crashed server auto-healed for the rest of the test cases in the pipeline. Would you have any idea @lantiga ?

@lantiga
Copy link
Contributor

lantiga commented Mar 28, 2020

@hhsecond I think I got it: queueEvict had a big bug in it.
Great job with the repro BTW, it would have been really hard to spot the bug without it.

Note: I had to port the test to multiprocessing so I can now kill the pending MODELRUN process. Otherwise the test passes but we get an exception afterwards, because redis-server is terminated by the runner but that request is still pending.

@lantiga lantiga merged commit d468198 into batching Mar 28, 2020
@hhsecond hhsecond deleted the branching_crashtest branch March 28, 2020 01:53
lantiga added a commit that referenced this pull request Mar 30, 2020
* Add support for automated batching

Add support for inspection and eviction to queue

Mock run info batching

Mock run info batching

Make TF tests work

Add batching for ONNX and ONNX-ML

Fix torch API, still WIP

Fix torch backend

Fixes after rebasing

Add auto-batching to TFLite backend

Fix from rebase

Add batching args to command and change API accordingly

Add batching heuristics [WIP]

Fix TFLite test by accessing first tensor in first batch safely

Temporarily comment out wrong_bg test check

Implement batching heuristics

Introduce autobatch tests, tflite still fails

Fix segfault when error was generated from the backend

Fix tflite autobatch test

Updated documentation with auto batching

Remove stale comments

Avoid making extra copies of inputs and outputs when batch count is 1

Address review comments re const-correctness

Add tests to detect failures

Fix slicing and concatenation

Fix tensor slicing and concatenating

Temporarily disable tflite autobatch test due to tflite limitation

Disable support for autobatching for TFLITE

* Fix TFLite and tests after rebase

* Temporarily disable macos CI build

* Add synchronization to autobatch tests

* Add synchronization to autobatch thread

* Add synchronization to autobatch thread

* Batching crashtest (#310)

* test cases for crash test

* Fix issue with evict. Port test to multiprocessing to allow killing pending command.

* Use terminate instead of kill

Co-authored-by: Luca Antiga <[email protected]>

Co-authored-by: Sherin Thomas <[email protected]>
lantiga added a commit that referenced this pull request May 6, 2020
 
Add support for batching (take two) (#270)

* Add support for automated batching

Add support for inspection and eviction to queue

Mock run info batching

Mock run info batching

Make TF tests work

Add batching for ONNX and ONNX-ML

Fix torch API, still WIP

Fix torch backend

Fixes after rebasing

Add auto-batching to TFLite backend

Fix from rebase

Add batching args to command and change API accordingly

Add batching heuristics [WIP]

Fix TFLite test by accessing first tensor in first batch safely

Temporarily comment out wrong_bg test check

Implement batching heuristics

Introduce autobatch tests, tflite still fails

Fix segfault when error was generated from the backend

Fix tflite autobatch test

Updated documentation with auto batching

Remove stale comments

Avoid making extra copies of inputs and outputs when batch count is 1

Address review comments re const-correctness

Add tests to detect failures

Fix slicing and concatenation

Fix tensor slicing and concatenating

Temporarily disable tflite autobatch test due to tflite limitation

Disable support for autobatching for TFLITE

* Fix TFLite and tests after rebase

* Temporarily disable macos CI build

* Add synchronization to autobatch tests

* Add synchronization to autobatch thread

* Add synchronization to autobatch thread

* Batching crashtest (#310)

* test cases for crash test

* Fix issue with evict. Port test to multiprocessing to allow killing pending command.

* Use terminate instead of kill

Co-authored-by: Luca Antiga <[email protected]>

Co-authored-by: Sherin Thomas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants