-
Notifications
You must be signed in to change notification settings - Fork 635
Description
This is potential design that could hit three interconnected birds with one stone:
- 🐣 Revisit design of Python type annotations #193
- 🐥 Design type system & signature #205
- 🐔 Review model API design #259
Those tickets contain user data and requirements. Please read them for context.
This is a strawman. It is most likely wrong. Please attack with your sharpest pitchforks!
Model definition
The model API is specified using Pydantic types in the predict function.
import cog
from pathlib import Path
from pydantic import Field
import torch
class Predictor(cog.Predictor):
def setup(self):
self.net = torch.load("weights.pth")
def predict(
self,
input: Path = Field(..., title="Image to enlarge"),
scale: float = Field(1.5, description="Factor to scale image by"),
) -> Path:
"""Run a single prediction on the model"""
# ... pre-processing ...
output = self.net(input)
# ... post-processing ...
return outputYou could imagine some custom types like cog.Field if we need custom behavior, or don't want the extra import.
This predictor serves:
GET /predict
Takes a JSON object in request body:
{
"input": "https://bucket.s3.amazon.com/...", # or "data:foo/bar;base64,..."
"scale": 1.7,
}Returns:
{
"output": "https://bucket.s3.amazon.com/..."
}
Everything is JSON. Files are represented as URLs -- HTTP, cloud storage, data: URLs, or perhaps even file:// for mounted storage. This is for simplicity, so it can be represented sensibly in OpenAPI, and so more complex data structures can be represented.
If a more complex Pydantic object was defined as the return value, output would be a JSON object. The root object can also include other information about the run -- some prior art in #259 might be informative about what we want to include (metrics, etc).
Authentication and configuration for cloud storage needs some thought. It could specify a prefix for storing output files in the request.
GET /docs
A Swagger UI, generated automatically by FastAPI.
GET /openapi.json
An OpenAPI definition, generated automatically by FastAPI. This is used as the type definition. It contains a complete documentation of the HTTP API and entries like this to define the types for input/output:
{
"components": {
"schemas": {
"Input": {
"title": "Input",
"required": [
"input"
],
"type": "object",
"properties": {
"input": {
"title": "Image to enlarge",
"type": "string",
"format": "path"
},
"scale": {
"title": "Scale",
"type": "number",
"description": "Factor to scale image by",
"default": 1.5
}
}
},
"Output": ...
}
}
}
How this would be used for Replicate
We could either:
- Integrate with existing redis queue worker.
- Run a sidecar container that converts the current Redis API into HTTP.
Over time, we could define a queue API and include that in Cog. (Or leave it as an external system. See discussion in #259.)
Files
Files are represented as URLs. data:, http:, gcs:, s3:, etc.
This works well for input, but some way of specifying where output files go is required. Predictions could produce arbitrary output files (or should they be able to?) so some generic way of defining where multiple files can go is needed. For filesystem or bucket, this could be a prefix, and the prediction is responsible for naming files such that they don't collide.
Perhaps prediction requests could include the output path prefix:
{
"input": {"image": "s3://predictions/1/input/image.jpg"},
"output_path": "s3://predictions/1/output/",
}
For Replicate, files could be fetched and uploaded from Cloud Storage using signed URLs with a prefix. Unfortunately signed URLs can't be made for prefixes. You can assign IAM permissions for a prefix, but that would mean a service account for each prediction, which is unwieldy.
Some potential solutions:
- We use serving's existing file storage system
- We send files directly back to replicate-web, bypassing serving's file storage system. Perhaps if
- We use data URIs...?!
Future
- An asynchronous API, where a job is started and you can query the status of the job. (This might be required for passing logs to Replicate?)
- This request/response can be passed over a queue. The same type definition could be used.
- How would do we support gRPC? Would we want to generate a generic protobuf schema that supports any kind of model, or generate one per-model somehow?
- We might want a more FastAPI-like API instead of the class-based
Predictor. - Custom URL routes on the FastAPI server. These might be namespaced under
/extensionsto preserve the stability of the top-level API.
Reflection
FastAPI is very HTTP + JSON centric. OpenAPI is too. This is optimal for simple use-cases, but using more advanced/efficient systems like gRPC or queues may feel like an impedance mismatch.
Implementation details
The predictor in the example above would be converted into a FastAPI method like this:
app = FastAPI()
class Input(BaseModel):
input = Path = Field(..., title="Image to enlarge")
scale = float = Field(1.5, description="Factor to scale image by")
@app.post("/predict")
async def predict(input: Input):
# ...
return Path(...)We would need to come up with a custom Path handler that knows how to convert it into URLs or base64 encoded strings.