-
Notifications
You must be signed in to change notification settings - Fork 111
Closed
Description
您好,
我使用ggml quantize 成為 q6_K format, 然後用以下 code 做inference
`
from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# load Llama-2 model
llm = LlamaCpp(
model_path="/workspace/test/TaiwanLLama_v1.0/Taiwan-LLaMa-13b-1.0.ggmlv3.q6_K.bin",
n_gpu_layers=16,
n_batch=8,
n_ctx=2048,
temperature=0.1,
max_tokens=512,
callback_manager=callback_manager,
)
# response = run_simple_qa(llm, query)
prompt_template = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {} ASSISTANT:"""
prompt = prompt_template.format("什麼是深度學習?")
response = llm(prompt)
`
結果會掉字... 如下:
深度學是機器學的一子集,基人工神經結。使得計算機能通別模式大量中學,而不需要明編程。深度學算法用分、進行和別模式
Metadata
Metadata
Assignees
Labels
No labels