Skip to content

使用ggmlv3 q6_K model, inference會掉字 #30

@wennycooper

Description

@wennycooper

您好,
我使用ggml quantize 成為 q6_K format, 然後用以下 code 做inference

`
from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

# load Llama-2 model
llm = LlamaCpp(
    model_path="/workspace/test/TaiwanLLama_v1.0/Taiwan-LLaMa-13b-1.0.ggmlv3.q6_K.bin",
    n_gpu_layers=16,
    n_batch=8,
    n_ctx=2048,
    temperature=0.1,
    max_tokens=512,
    callback_manager=callback_manager,
)

# response = run_simple_qa(llm, query)
prompt_template = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {} ASSISTANT:"""
prompt = prompt_template.format("什麼是深度學習?")
response = llm(prompt)

`
結果會掉字... 如下:

深度學是機器學的一子集,基人工神經結。使得計算機能通別模式大量中學,而不需要明編程。深度學算法用分、進行和別模式

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions