Define a metadata format for textual inversion concepts

**Is your feature request related to a problem? Please describe.**
The textual inversion script currently creates a dict with one term and its associated embedding. Usual scripts load one or more embeddings using the keys as token for prompt.

This format does not allow for metadata associated with the key(s). The longer the format exists, the more it will be used everywhere, so it would be good to define a more extendable format as soon as possible.

**Describe the solution you'd like**
Change the format from:

```
{
  "term": embedding_tensor
}
```

to something like

=== First version (leaving room for additional data for each term) ===
```
{
  "term": {
    "embedding": embedding_tensor
  }
}
```

or even


=== Second version  (leaving room for additional data for each term and data for the whole file) ===
```
{
  "terms": [
    "term": {
      "embedding": embedding_tensor
    }
  }
}
```

which leaves room for adding metadata that not every script needs to understand like

=== First version ===
```
{
  "term": {
    "embedding": embedding_tensor
    "training_steps": 2000 # will be shown in the loading screen
    "training_files": ["hash1", "hash2", "hash3"] # is used to continue training
    "some_other_metadata": "for_the_embedding"
  }
}
```


=== Second version ===
```
{
  "terms": [
    "term": {
      "embedding": embedding_tensor
      "training_steps": 2000 # will be shown in the loading screen
      "training_files": ["hash1", "hash2", "hash3"] # is used to continue training
      "some_other_metadata": "for_the_embedding"
    }
  },
  "author": "My Name", # can be extracted for attribution
  "license": "CC-0"
  "some_other_metadata": "for_the_file"
}
```

The additional keys can be read or can be ignored (which means front-end developers can agree on a standard later on and convert between different formats), but moving the tensor into a sub-structure is required to make it possible to add additional data, which can be distinguished from the actual embedding data, without breaking other people's codes.

**Describe alternatives you've considered**
With the current structure it would require workarounds like a special key "__metadata__" which is skipped by scripts loading the embeddings. Another option would be to store the metadata in a separate file.

**Additional context**
It would often be useful to provide a embedding in different versions and it should be easy to bundle the associated metadata with the embedding.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Define a metadata format for textual inversion concepts #799

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Define a metadata format for textual inversion concepts #799

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions