Skip to content

Optimize outline for python #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TechupBusiness opened this issue Mar 5, 2025 · 5 comments
Open

Optimize outline for python #10

TechupBusiness opened this issue Mar 5, 2025 · 5 comments

Comments

@TechupBusiness
Copy link

I used the outlines method to create information about python code.

main.py
॥๛॥
⋮...
█def init_settings():
⋮...
█def init_encryption() -> Fernet:
⋮...
█async def init_database(data_dir: str) -> None:
⋮...
█async def main_async():
⋮...
█def main():
⋮...

Looks like some methods are cut off like test_init_encryption_new_password:

॥๛॥
tests/unit/test_main.py
॥๛॥
⋮...
█class TestMain:
⋮...
█    def test_init_settings_when_file_not_exists(self, mock_copy, mock_path_exists):
⋮...
█    def test_init_settings_file_exists(self, mock_copy, mock_path_exists):
⋮...
█    def test_init_encryption_new_password(self, mock_get_fernet, mock_hash_password, 
⋮...
█    def test_init_encryption_existing_password_correct(self, mock_get_fernet, 
⋮...
█    def test_init_encryption_existing_password_incorrect(self, mock_exit, 
⋮...

I would suggest a simpler syntax:

# src/utils.py
def add(a: int, b: int) -> int:
class Helper:
  value: int
  def __init__(self, value1: int) -> None:
  def increment(self) -> None:
  async def get_info(self) -> Dict[str, Any]:

# src/main.py
def main() -> None:

And ideally it could also read the doc-strings in python (Docstrings are a specific type of comment in Python used to document code). This can give the LLM important context (for example return object structures could be described in the comments etc). According to a quick research with Grok, Treesitter should be able to do this. Example for output:

# src/utils.py
def add(a: int, b: int) -> int:
class Helper: I'm the class comment in one line, even if I have multiple lines.
  value: int
  def __init__(self, value1: int) -> None: I'm describing the method
  def increment(self) -> None: Another description for the method
  async def get_info(self) -> Dict[str, Any]:

# src/main.py
def main() -> None:
@TechupBusiness TechupBusiness changed the title Optimize outline Optimize outline for python Mar 5, 2025
@TechupBusiness
Copy link
Author

TechupBusiness commented Mar 5, 2025

Another example:

class NotificationORM(Model):
    """
    ORM model representing a notification.
    """
    id = fields.CharField(pk=True, max_length=50) # id: int - notification id
    title = fields.CharField(max_length=255) # title: str
    message = fields.TextField() # string
    type = fields.CharField(max_length=50)
    reference_type = fields.CharField(max_length=50)
    reference_id = fields.CharField(max_length=50)
    timestamp = fields.DatetimeField()

    class Meta:
        table = "notification"

At the moment it is only outputted as:

॥๛॥
/path/notification_schema.py
॥๛॥
⋮...
█class NotificationORM(Model):
⋮...
█    class Meta:
⋮...

This will not give the necessary context. It also needs the attributes for such objects.

Ideally:

# /path/notification_schema.py
class NotificationORM(Model): # ORM model representing a notification.
  id # id: int - notification id
  title # title: str
  message # string
  type
  reference_type
  reference_id
  timestamp
  class Meta:
    table

For languages like python with less type-safety and integrated extensive doc-strings, instructions how to write code, that can be interpreted properly, would be useful.

@restlessronin
Copy link
Member

@TechupBusiness

  1. the truncation is occurring because only one line from the declaration is used in the outline. I think it should be relatively easy to make the entire declaration show up.

  2. Good point about the doc strings. I've personally stopped adding doc strings in favor of really clear function names and arguments. The LLM can usually figure out a lot from just the declaration, and doc strings eat up a lot of context. I think this should be do-able, but likely the tag files have to be changed. I recall looking into it a year ago, but at that time tree-sitter was going through some API instability and I decided to wait till that was sorted out. I think I would prefer to wait and see if having the full declaration is sufficient.

  3. I'm a little reluctant to change the outline to the simpler syntax you're suggesting. Currently the outline is just verbatim lines from the file with ... to indicate missing lines. This maps directly to the file contents whereas the simplified syntax is perhaps more ambiguous.

  4. About including fields for classes, let me take a look at the tags files.

I'm also reluctant to modify the base tags files. They mostly come from the language parser repos and are written by tree-sitter experts who are using it actively. I'd rather not second guess their implementations. I did start work on a way for the LLM to ask for a specific class/function definition if it wants to see it (instead of the entire file), and that involves creating new query files which require some work to figure out. I'm not sure how easy/hard that is going to be to maintain when the parsers change. Wonder if you think that's a useful feature?

I'll take a crack at 1 and let you know how that's working out.

@restlessronin
Copy link
Member

restlessronin commented Mar 9, 2025

@TechupBusiness I couldn't find a clean way to get the full python declaration into the outline. I'll try something hackier next, that uses (possibly non-tree sitter) language specific code.

I just wanted to let you know that I have added an MCP implementation fetcher to work with the MCP outliner. The LLM can now fetch the full definition of any function etc. from the outline by asking for it by file and function name. No need to read the whole file into context. Since you're using the outline feature, just wanted to get your feedback on it.

Works for all supprted languages except C/C++ (their tags files don't support the right structure).

This endpoint was added in v0.2.14 published today.

@TechupBusiness
Copy link
Author

Sorry that I couldn't test it properly yet. I didn't forget it.

@restlessronin
Copy link
Member

@TechupBusiness no worries. I haven't forgotten this issue. I'll put some time into this project over the next few days. Hope to get to this issue as well (I have some ideas on how to do it in a clean way).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants