Skip to content

Commit 8752589

Browse files
lantigaSherin Thomas
andauthored
Add support for batching (take two) (#270)
* Add support for automated batching Add support for inspection and eviction to queue Mock run info batching Mock run info batching Make TF tests work Add batching for ONNX and ONNX-ML Fix torch API, still WIP Fix torch backend Fixes after rebasing Add auto-batching to TFLite backend Fix from rebase Add batching args to command and change API accordingly Add batching heuristics [WIP] Fix TFLite test by accessing first tensor in first batch safely Temporarily comment out wrong_bg test check Implement batching heuristics Introduce autobatch tests, tflite still fails Fix segfault when error was generated from the backend Fix tflite autobatch test Updated documentation with auto batching Remove stale comments Avoid making extra copies of inputs and outputs when batch count is 1 Address review comments re const-correctness Add tests to detect failures Fix slicing and concatenation Fix tensor slicing and concatenating Temporarily disable tflite autobatch test due to tflite limitation Disable support for autobatching for TFLITE * Fix TFLite and tests after rebase * Temporarily disable macos CI build * Add synchronization to autobatch tests * Add synchronization to autobatch thread * Add synchronization to autobatch thread * Batching crashtest (#310) * test cases for crash test * Fix issue with evict. Port test to multiprocessing to allow killing pending command. * Use terminate instead of kill Co-authored-by: Luca Antiga <[email protected]> Co-authored-by: Sherin Thomas <[email protected]>
1 parent 62b2fa2 commit 8752589

27 files changed

+1304
-251
lines changed

.circleci/config.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -195,12 +195,12 @@ workflows:
195195
only: /.*/
196196
tags:
197197
only: /.*/
198-
#- build-macos:
199-
# filters:
200-
# branches:
201-
# ignore: /.*/
202-
# tags:
203-
# only: /^v[0-9].*/
198+
# - build-macos:
199+
# filters:
200+
# branches:
201+
# ignore: /.*/
202+
# tags:
203+
# only: /^v[0-9].*/
204204
#- build-multiarch-docker:
205205
# filters:
206206
# tags:

docs/commands.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,12 +67,22 @@ AI.TENSORGET foo BLOB
6767
Set a model.
6868

6969
```sql
70-
AI.MODELSET model_key backend device [INPUTS name1 name2 ... OUTPUTS name1 name2 ...] model_blob
70+
AI.MODELSET model_key backend device [BATCHSIZE n [MINBATCHSIZE m]] [INPUTS name1 name2 ... OUTPUTS name1 name2 ...] model_blob
7171
```
7272

7373
* model_key - Key for storing the model
7474
* backend - The backend corresponding to the model being set. Allowed values: `TF`, `TORCH`, `ONNX`.
7575
* device - Device where the model is loaded and where the computation will run. Allowed values: `CPU`, `GPU`.
76+
* BATCHSIZE n - Batch incoming requests from multiple clients if they hit the same model and if input tensors have the same
77+
shape. Upon MODELRUN, the request queue is visited, input tensors from compatible requests are concatenated
78+
along the 0-th (batch) dimension, up until BATCHSIZE is exceeded. The model is then run for the entire batch,
79+
results are unpacked back among the individual requests and the respective clients are unblocked.
80+
If the batch size of the inputs to the first request in the queue exceeds BATCHSIZE, the request is served
81+
in any case. Default is 0 (no batching).
82+
* MINBATCHSIZE m - Do not execute a MODELRUN until the batch size has reached MINBATCHSIZE. This is primarily used to force
83+
batching during testing, but it can also be used under normal operation. In this case, note that requests
84+
for which MINBATCHSIZE is not reached will hang indefinitely.
85+
Default is 0 (no minimum batch size).
7686
* INPUTS name1 name2 ... - Name of the nodes in the provided graph corresponding to inputs [`TF` backend only]
7787
* OUTPUTS name1 name2 ... - Name of the nodes in the provided graph corresponding to outputs [`TF` backend only]
7888
* model_blob - Binary buffer containing the model protobuf saved from a supported backend
@@ -91,6 +101,14 @@ AI.MODELSET resnet18 TF CPU INPUTS in1 OUTPUTS linear4 < foo.pb
91101
AI.MODELSET mnist_net ONNX CPU < mnist.onnx
92102
```
93103

104+
```sql
105+
AI.MODELSET mnist_net ONNX CPU BATCHSIZE 10 < mnist.onnx
106+
```
107+
108+
```sql
109+
AI.MODELSET resnet18 TF CPU BATCHSIZE 10 MINBATCHSIZE 6 INPUTS in1 OUTPUTS linear4 < foo.pb
110+
```
111+
94112
## AI.MODELGET
95113

96114
Get a model.

src/backends.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ int RAI_LoadBackend_TensorFlow(RedisModuleCtx *ctx, const char *path) {
7474
}
7575
init_backend(RedisModule_GetApi);
7676

77-
backend.model_create_with_nodes = (RAI_Model* (*)(RAI_Backend, const char*,
77+
backend.model_create_with_nodes = (RAI_Model* (*)(RAI_Backend, const char*, RAI_ModelOpts,
7878
size_t, const char**, size_t, const char**,
7979
const char*, size_t, RAI_Error*))
8080
(unsigned long) dlsym(handle, "RAI_ModelCreateTF");
@@ -140,7 +140,7 @@ int RAI_LoadBackend_TFLite(RedisModuleCtx *ctx, const char *path) {
140140
}
141141
init_backend(RedisModule_GetApi);
142142

143-
backend.model_create = (RAI_Model* (*)(RAI_Backend, const char*,
143+
backend.model_create = (RAI_Model* (*)(RAI_Backend, const char*, RAI_ModelOpts,
144144
const char*, size_t, RAI_Error*))
145145
(unsigned long) dlsym(handle, "RAI_ModelCreateTFLite");
146146
if (backend.model_create == NULL) {
@@ -205,7 +205,7 @@ int RAI_LoadBackend_Torch(RedisModuleCtx *ctx, const char *path) {
205205
}
206206
init_backend(RedisModule_GetApi);
207207

208-
backend.model_create = (RAI_Model* (*)(RAI_Backend, const char*,
208+
backend.model_create = (RAI_Model* (*)(RAI_Backend, const char*, RAI_ModelOpts,
209209
const char*, size_t, RAI_Error*))
210210
(unsigned long) dlsym(handle, "RAI_ModelCreateTorch");
211211
if (backend.model_create == NULL) {
@@ -294,7 +294,7 @@ int RAI_LoadBackend_ONNXRuntime(RedisModuleCtx *ctx, const char *path) {
294294
}
295295
init_backend(RedisModule_GetApi);
296296

297-
backend.model_create = (RAI_Model* (*)(RAI_Backend, const char*,
297+
backend.model_create = (RAI_Model* (*)(RAI_Backend, const char*, RAI_ModelOpts,
298298
const char*, size_t, RAI_Error*))
299299
(unsigned long) dlsym(handle, "RAI_ModelCreateORT");
300300
if (backend.model_create == NULL) {

src/backends.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@
88
#include "err.h"
99

1010
typedef struct RAI_LoadedBackend {
11-
RAI_Model* (*model_create_with_nodes)(RAI_Backend, const char*,
11+
RAI_Model* (*model_create_with_nodes)(RAI_Backend, const char*, RAI_ModelOpts,
1212
size_t, const char**, size_t, const char**,
1313
const char*, size_t, RAI_Error*);
14-
RAI_Model* (*model_create)(RAI_Backend, const char*,
14+
RAI_Model* (*model_create)(RAI_Backend, const char*, RAI_ModelOpts,
1515
const char*, size_t, RAI_Error*);
1616
void (*model_free)(RAI_Model*, RAI_Error*);
1717
int (*model_run)(RAI_ModelRunCtx*, RAI_Error*);

0 commit comments

Comments
 (0)