-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[GenAI] Add readme to Microsoft.ML.GenAI.Phi #7206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
LittleLittleCloud
merged 7 commits into
dotnet:main
from
LittleLittleCloud:u/xiaoyun/phi-readme
Aug 5, 2024
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
126d80c
add readme
LittleLittleCloud b45eb89
Update src/Microsoft.ML.GenAI.Phi/README.md
LittleLittleCloud d1019b8
Update src/Microsoft.ML.GenAI.Phi/README.md
LittleLittleCloud 3800b14
Update src/Microsoft.ML.GenAI.Phi/README.md
LittleLittleCloud e7a51c6
Update src/Microsoft.ML.GenAI.Phi/README.md
LittleLittleCloud e2083aa
Update src/Microsoft.ML.GenAI.Phi/README.md
LittleLittleCloud cc55546
Update src/Microsoft.ML.GenAI.Phi/README.md
LittleLittleCloud File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,119 @@ | ||
| # Microsoft.ML.GenAI.Phi | ||
| Torchsharp implementation of Microsoft phi-series models for GenAI | ||
|
|
||
| ## Supported list | ||
| The following phi-models are supported and tested: | ||
| - [x] [Phi-2](https://huggingface.co/microsoft/phi-2) | ||
| - [x] [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) | ||
| - [x] [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) | ||
| - [ ] [Phi-3-small-8k-instruct](https://huggingface.co/microsoft/Phi-3-small-8k-instruct) | ||
| - [ ] [Phi-3-small-128k-instruct](https://huggingface.co/microsoft/Phi-3-small-128k-instruct) | ||
| - [ ] [Phi-3-medium-4k-instruct](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct) | ||
| - [ ] [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct) | ||
| - [ ] [Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-large-4k-instruct) | ||
|
|
||
| ## Getting Started with Semantic Kernel | ||
|
|
||
| ### Download model weight (e.g. phi-3-mini-4k-instruct) from Huggingface | ||
| ```bash | ||
| ## make sure you have lfs installed | ||
| git clone https://huggingface.co/microsoft/Phi-3-mini-4k-instruct | ||
| ``` | ||
|
|
||
| ### Load model | ||
| ```csharp | ||
| var weightFolder = "/path/to/Phi-3-mini-4k-instruct"; | ||
| var configName = "config.json"; | ||
| var config = JsonSerializier.Deserialize<Phi3Config>(File.ReadAllText(Path.Combine(weightFolder, configName))); | ||
| var model = new Phi3ForCasualLM(config); | ||
|
|
||
| // load tokenizer | ||
| var tokenizerModelName = "tokenizer.model"; | ||
| var tokenizer = Phi3TokenizerHelper.FromPretrained(Path.Combine(weightFolder, tokenizerModelName)); | ||
|
|
||
| // load weight | ||
| model.LoadSafeTensors(weightFolder); | ||
|
|
||
| // initialize device | ||
| var device = "cuda"; | ||
| if (device == "cuda") | ||
| { | ||
| torch.InitializeDeviceType(DeviceType.CUDA); | ||
| } | ||
|
|
||
|
|
||
| // create causal language model pipeline | ||
| var pipeline = new CausalLMPipeline<Tokenizer, Phi3ForCausalLM>(tokenizer, model, device); | ||
| ``` | ||
|
|
||
| ### Add pipeline as `IChatCompletionService` to sematic kernel | ||
| ```csharp | ||
| var kernel = Kernel.CreateBuilder() | ||
| .AddGenAIChatCompletion(pipeline) | ||
| .Build(); | ||
| ``` | ||
|
|
||
| ### Chat with the model | ||
| ```csharp | ||
| var chatService = kernel.GetRequiredService<IChatCompletionService>(); | ||
| var chatHistory = new ChatHistory(); | ||
| chatHistory.AddSystemMessage("you are a helpful assistant"); | ||
| chatHistory.AddUserMessage("write a C# program to calculate the factorial of a number"); | ||
| await foreach (var response in chatService.GetStreamingChatMessageContentsAsync(chatHistory)) | ||
| { | ||
| Console.Write(response); | ||
| } | ||
| ``` | ||
|
|
||
| ## Getting started with AutoGen.Net | ||
| ### Follow the same steps download model weight and load model | ||
| ### Create `Phi3Agent` from pipeline | ||
| ```csharp | ||
| var agent = new Phi3Agent(pipeline, name: "assistant") | ||
| .RegisterPrintMessage(); | ||
| ``` | ||
|
|
||
| ### Chat with the model | ||
| ```csharp | ||
| var task = """ | ||
| write a C# program to calculate the factorial of a number | ||
| """; | ||
|
|
||
| await agent.SendAsync(task); | ||
| ``` | ||
|
|
||
| ### More examples | ||
| Please refer to [Microsoft.ML.GenAI.Samples](./../../docs/samples/Microsoft.ML.GenAI.Samples/) for more examples. | ||
|
|
||
| ## Dynamic loading | ||
| It's recommended to run model inference on GPU, which requires at least 8GB of GPU memory for phi-3-mini-4k-instruct model if fully loaded. | ||
|
|
||
| If your GPU memory is not enough, you can choose to dynamically load the model weight to GPU memory. Here is how it works behind the scene: | ||
| - when initializing the model, the size of each layer is calculated and stored in a dictionary | ||
| - when loading the model weight, each layer is assigned to a device (CPU or GPU) based on the size of the layer and the remaining memory of the device. If there is no enough memory on the device, the layer is loaded to CPU memory. | ||
| - when inference, the layer which is loaded to CPU memory is moved to GPU memory before the inference and moved back to CPU memory after the inference. | ||
|
|
||
| Here is how to enable dynamic loading of model: | ||
| ### Step 1: Infer the size of each layer | ||
| You can infer the size of each layer using `InferDeviceMapForEachLayer` API. The `deviceMap` will be a key-value dictionary, where the key is the layer name and the value is the device name (e.g. "cuda" or "cpu"). | ||
|
|
||
| ```csharp | ||
| // manually set up the available memory on each device | ||
| var deviceSizeMap = new Dictionary<string, long> | ||
| { | ||
| ["cuda"] = modelSizeOnCudaInGB * 1L * 1024 * 1024 * 1024, | ||
| ["cpu"] = modelSizeOnMemoryInGB * 1L * 1024 * 1024 * 1024, | ||
| ["disk"] = modelSizeOnDiskInGB * 1L * 1024 * 1024 * 1024, | ||
| }; | ||
|
|
||
| var deviceMap = model.InferDeviceMapForEachLayer( | ||
| devices: ["cuda", "cpu", "disk"], | ||
| deviceSizeMapInByte: deviceSizeMap); | ||
| ``` | ||
|
|
||
| ### Step 2: Load model weights using `ToDynamicLoadingModel` API | ||
| Once the `deviceMap` is calculated, you can pass it to `ToDynamicLoadingModel` api to load the model weight. | ||
|
|
||
| ```csharp | ||
| model = model.ToDynamicLoadingModel(deviceMap, "cuda"); | ||
| ``` | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the ones that aren't checked, do we have a path to support them? If not, maybe remove them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am working on implementing those models