使用ggmlv3 q6_K model, inference會掉字

您好, 
我使用ggml quantize 成為 q6_K format, 然後用以下 code  做inference

`
from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

    # Callbacks support token-wise streaming
    callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

    # load Llama-2 model
    llm = LlamaCpp(
        model_path="/workspace/test/TaiwanLLama_v1.0/Taiwan-LLaMa-13b-1.0.ggmlv3.q6_K.bin",
        n_gpu_layers=16,
        n_batch=8,
        n_ctx=2048,
        temperature=0.1,
        max_tokens=512,
        callback_manager=callback_manager,
    )

    # response = run_simple_qa(llm, query)
    prompt_template = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {} ASSISTANT:"""
    prompt = prompt_template.format("什麼是深度學習?")
    response = llm(prompt)

`
結果會掉字... 如下:

`
 深度學是機器學的一子集，基人工神經結。使得計算機能通別模式大量中學，而不需要明編程。深度學算法用分、進行和別模式
`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

使用ggmlv3 q6_K model, inference會掉字 #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

使用ggmlv3 q6_K model, inference會掉字 #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions