Strawman: FastAPI model server

This is potential design that could hit three interconnected birds with one stone:
- 🐣 #193 
- 🐥 #205 
- 🐔 #259 

Those tickets contain user data and requirements. Please read them for context.

This is a strawman. It is most likely wrong. Please attack with your sharpest pitchforks!

## Model definition

The model API is specified using Pydantic types in the predict function.

```python
import cog
from pathlib import Path
from pydantic import Field
import torch


class Predictor(cog.Predictor):
    def setup(self):
        self.net = torch.load("weights.pth")

    def predict(
        self,
        input: Path = Field(..., title="Image to enlarge"),
        scale: float = Field(1.5, description="Factor to scale image by"),
    ) -> Path:
        """Run a single prediction on the model"""
        # ... pre-processing ...
        output = self.net(input)
        # ... post-processing ...
        return output
```

You could imagine some custom types like `cog.Field` if we need custom behavior, or don't want the extra import.

This predictor serves:

### `GET /predict`

Takes a JSON object in request body:

```json
{
  "input": "https://bucket.s3.amazon.com/...", # or "data:foo/bar;base64,..."
  "scale": 1.7,
}
```

Returns:

```
{
  "output": "https://bucket.s3.amazon.com/..."
}
```

Everything is JSON. Files are represented as URLs -- HTTP, cloud storage, data: URLs, or perhaps even file:// for mounted storage. This is for simplicity, so it can be represented sensibly in OpenAPI, and so more complex data structures can be represented.

If a more complex Pydantic object was defined as the return value, `output` would be a JSON object. The root object can also include other information about the run -- some prior art in #259 might be informative about what we want to include (metrics, etc).

Authentication and configuration for cloud storage needs some thought. It could specify a prefix for storing output files in the request.

### `GET /docs`

A Swagger UI, generated automatically by FastAPI.

### `GET /openapi.json`

An OpenAPI definition, generated automatically by FastAPI. This is used as the type definition. It contains a complete documentation of the HTTP API and entries like this to define the types for input/output:

```
{
  "components": {
    "schemas": {
      "Input": {
        "title": "Input",
        "required": [
          "input"
        ],
        "type": "object",
        "properties": {
          "input": {
            "title": "Image to enlarge",
            "type": "string",
            "format": "path"
          },
          "scale": {
            "title": "Scale",
            "type": "number",
            "description": "Factor to scale image by",
            "default": 1.5
          }
        }
      },
      "Output": ...
    }
  }
}
```

## How this would be used for Replicate

We could either:

1. Integrate with existing redis queue worker.
2. Run a sidecar container that converts the current Redis API into HTTP.

Over time, we could define a queue API and include that in Cog. (Or leave it as an external system. See discussion in #259.)

## Files

Files are represented as URLs. `data:`, `http:`, `gcs:`, `s3:`, etc.

This works well for input, but some way of specifying where output files go is required. Predictions could produce arbitrary output files (or should they be able to?) so some generic way of defining where multiple files can go is needed. For filesystem or bucket, this could be a prefix, and the prediction is responsible for naming files such that they don't collide.

Perhaps prediction requests could include the output path prefix:

```
{
  "input": {"image": "s3://predictions/1/input/image.jpg"},
  "output_path": "s3://predictions/1/output/",
}
```

~For Replicate, files could be fetched and uploaded from Cloud Storage using signed URLs with a prefix.~ Unfortunately signed URLs can't be made for prefixes. You can assign IAM permissions for a prefix, but that would mean a service account for each prediction, which is unwieldy.

Some potential solutions:

1. We use serving's existing file storage system
2. We send files directly back to replicate-web, bypassing serving's file storage system. Perhaps if 
3. We use data URIs...?!


## Future

- An asynchronous API, where a job is started and you can query the status of the job. (This might be required for passing logs to Replicate?)
- This request/response can be passed over a queue. The same type definition could be used.
- How would do we support gRPC? Would we want to generate a generic protobuf schema that supports any kind of model, or generate one per-model somehow?
- We might want a more FastAPI-like API instead of the class-based `Predictor`.
- Custom URL routes on the FastAPI server. These might be namespaced under `/extensions` to preserve the stability of the top-level API.


## Reflection

FastAPI is very HTTP + JSON centric. OpenAPI is too. This is optimal for simple use-cases, but using more advanced/efficient systems like gRPC or queues may feel like an impedance mismatch.

## Implementation details

The predictor in the example above would be converted into a FastAPI method like this:

```python
app = FastAPI()

class Input(BaseModel):
    input = Path = Field(..., title="Image to enlarge")
    scale = float = Field(1.5, description="Factor to scale image by")

@app.post("/predict")
async def predict(input: Input):
    # ...
    return Path(...)
```

We would need to come up with a custom `Path` handler that knows how to convert it into URLs or base64 encoded strings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Strawman: FastAPI model server #327

Model definition

`GET /predict`

`GET /docs`

`GET /openapi.json`

How this would be used for Replicate

Files

Future

Reflection

Implementation details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Strawman: FastAPI model server #327

Description

Model definition

GET /predict

GET /docs

GET /openapi.json

How this would be used for Replicate

Files

Future

Reflection

Implementation details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`GET /predict`

`GET /docs`

`GET /openapi.json`