Skip to content

ValueError: from_string: error parsing grammar file: parsed_grammar.rules is empty #615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
talhalatifkhan opened this issue Aug 15, 2023 Discussed in #614 · 4 comments
Closed
Labels
bug Something isn't working

Comments

@talhalatifkhan
Copy link

Discussed in #614

Originally posted by talhalatifkhan August 16, 2023
I am trying to make sure that my output follow a json format every time, i stumbled upon jsonformer and from there i stumbled upon grammar-based sampling, I used json-schema-to-grammar.py to convert json schema.

I want to know if grammar based sampling is used for this specific purpose and if so then how do i use it.

Json schema

json_schema = {
    "type": "object",
    "properties": {
        "Stage": {
            "type": "string",
            "enum": ["first", "second"]
        },
        "Task Finished": {"type": "boolean"},
        "Statement": {"type": "string"},
        "Assistant": {"type": "string"}
    }
}

Llama grammar

space ::= " "?
string ::=  "\"" (
        [^"\\] |
        "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
      )* "\"" space 
Stage ::= "\"first\"" | "\"second\""
boolean ::= ("true" | "false") space
root ::= "{" space "\"Assistant\"" space ":" space string "," space "\"Stage\"" space ":" space Stage "," space "\"Statement\"" space ":" space string "," space "\"Task Finished\"" space ":" space boolean "}" space

Here is my code

from llama_cpp import Llama, LlamaGrammar

fs_template = """
You are a precise AI comparer. Your task is to match the user's intent to the statements in the context and confirm if the identified intent is correct.
Your responses should strictly follow the format below:
    Stage: [print 'first']
    User Intent: [insert user intent statement here]
    Task Finished: [insert boolean value based on whether user intent is confirmed]
    Assistant: [inser Assistant response here ]


Adhere to the following instructions to complete the task:
1. Start by trying to match the user's question to the statements in the context.
2. If you identify the matching statement to the user's question then confirm it from the user.
3. If the user's intent is unclear or doesn't match the context, ask follow-up questions by providing the options in the context.
4. Once you have confirmed the user intent, set "Task Finished: True" and proceed with your response.
5. You will fail your task if the output generated does not follow the format mentioned above.

Context: (only knowledge base you have)
------------
sample context
-----------
"""

schema = '''
space ::= " "?
string ::=  "\"" (
        [^"\\] |
        "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
      )* "\"" space 
Stage ::= "\"first\"" | "\"second\""
boolean ::= ("true" | "false") space
root ::= "{" space "\"Assistant\"" space ":" space string "," space "\"Stage\"" space ":" space Stage "," space "\"Statement\"" space ":" space string "," space "\"Task Finished\"" space ":" space boolean "}" space
'''


def get_prompt(question: str, chat_history: list,
               system_prompt: str) -> str:
    texts = [f'[INST] <<SYS>>\n{system_prompt}\n<</SYS>>\n\n']
    for user_input, response in chat_history:
        texts.append(f'{user_input.strip()} [/INST] {response.strip()} </s><s> [INST] ')
    texts.append(f'{question.strip()} [/INST]')
    return ''.join(texts)


history = []
prompt = get_prompt("user query", history, fs_template)

grammar = LlamaGrammar.from_string(grammar=schema, verbose=True)
print(grammar)
client = Llama(
    model_path="model/llama-2-13b-chat.ggmlv3.q8_0.bin",
    n_ctx=4098,
    n_threads=16,
    last_n_tokens_size=70,
)

answer = client(
    prompt,
    grammar=grammar,
    stream=False,
    temperature=0.0,
    top_p=0.95,
    top_k=50,
    repeat_penalty=1.3,
    max_tokens=4000,
)
print(answer)

This is the error i am getting

parse: error parsing grammar: expecting newline or end at \] |
        "\" (["\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
      )* """ space 
Stage ::= ""first"" | ""second""
boolean ::= ("true" | "false") space
root ::= "{" space ""Assistant"" space ":" space string "," space ""Stage"" space ":" space Stage "," space ""Statement"" space ":" space string "," space ""Task Finished"" space ":" space boolean "}" space

Traceback (most recent call last):
  File "/home/talha/CloudWhisper/jformer.py", line 49, in <module>
    grammar = LlamaGrammar.from_string(grammar=schema,verbose=True)
  File "/home/talha/.local/lib/python3.10/site-packages/llama_cpp/llama_grammar.py", line 66, in from_string
    raise ValueError(
ValueError: from_string: error parsing grammar file: parsed_grammar.rules is empty
@gjmulder gjmulder added the bug Something isn't working label Aug 16, 2023
@c0sogi
Copy link
Contributor

c0sogi commented Aug 17, 2023

Oh I see. That's not a bug. Just put r in front of ''''.
The escape character will screw up your schema.

So, the correct version of your schema should be:

schema = r'''
space ::= " "?
string ::=  "\"" (
        [^"\\] |
        "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
      )* "\"" space 
Stage ::= "\"first\"" | "\"second\""
boolean ::= ("true" | "false") space
root ::= "{" space "\"Assistant\"" space ":" space string "," space "\"Stage\"" space ":" space Stage "," space "\"Statement\"" space ":" space string "," space "\"Task Finished\"" space ":" space boolean "}" space
'''

@imaurer
Copy link

imaurer commented Aug 17, 2023

An issue that doesn't seem to be impacting the functionality, but my grammar doesn't get redisplayed in the output logs:

from_string grammar:
print_grammar: error printing grammar: malformed rule, does not end with LLAMA_GRETYPE_END: 0

Otherwise, this feature works great. Any idea when it will be released so I can pip install via pypi?

@c0sogi
Copy link
Contributor

c0sogi commented Aug 17, 2023

An issue that doesn't seem to be impacting the functionality, but my grammar doesn't get redisplayed in the output logs:

from_string grammar:
print_grammar: error printing grammar: malformed rule, does not end with LLAMA_GRETYPE_END: 0

Otherwise, this feature works great. Any idea when it will be released so I can pip install via pypi?

The printing error should be resolved with this:
#621

@talhalatifkhan
Copy link
Author

Thankyou @c0sogi , its working fine now

antoine-lizee pushed a commit to antoine-lizee/llama-cpp-python that referenced this issue Oct 30, 2023
...there was no check.  ported upstream from zanussbaum/gpt4all.cpp#2 (I dont see any clean path for upstream patches)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants