-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Debugger: New Feature: source-level debugging #13444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
To somewhat belatedly provide feedback on this, I have a really bad feeling about this new file format. It's a custom binary format which the MAME debugger itself can't create, so if support from other tools never flourishes, the whole thing stands to rot. The coupling to MAME's CPU state interface will also easily get worse considering the number of CPU types MAME supports. Speaking from the standpoint of someone interested in reverse engineering, I would much prefer a text file format for enhanced debugging information. The increased parsing overhead for something like JSON should matter less in this context than the ability to easily create and edit files without specialized tools. |
First off, source-level debugging is a top ask from Apple II users who run MAME or Ample, so I love that someone's done something with it. By way of some initial feedback, I'm not a fan of having separate source and assembly step and run commands. The majority of debuggers automatically show source if it's available for the current program counter and assembly if not. |
Regarding the file format, I do think it's important that MAME be able to just directly read whatever the assembler or compiler outputs for a given system, in much the same way that we prefer certain disk image formats but accept a wide variety of them. |
The trouble with this analogy is that MAME does not literally "read whatever the assembler or compiler outputs for a given system" in most cases. The primary purpose of object files is for their sections to be linked together and loaded into memory. For virtually any object file format sophisticated enough to even allow for extra debugging information, that is going to be performed not by MAME itself but by the emulated operating system or monitor program. I don't think storing debugging information in binary object files is the best solution here, but if that's to be supported, I think it would be better to use an existing format like ELF that provides robust debugging support for a wide range of architectures. |
Yes, I hear you on the concern that tools-writers must choose to support the file format for the feature not to rot away. However we slice it, there are platforms with popular tools that do not generate sufficient information for source-level debugging (in any form), and they would need to change one way or another for this feature to work. My plan is, should this feature be accepted into MAME, focus first on the CoCo / Dragon tools I've already been prototyping for: lwtools assembler, ugBasic compiler, and CMOC (C-like) compiler. I would work on PRs for them which turn my prototype code into real code they'd be willing to accept. (the ugBasic developer already expressed interest in supporting this feature back in October). Once CoCo / Dragon have decent support sufficiently visible, I would hope that interest would start growing to other platforms, and I'd be happy to help out other interested platforms with integrating the support. Point being that I'm expecting to continue to actively build up this feature from the tools end, as that is crucial.
Are you referring to the register #defines? They can be de-coupled via a lookup table if we agree that's the right way to go. Sorry if I'm misinterpreting.
I have considered using text instead of binary for the format, and I generally tend to lean toward binary for a few reasons. Binary is unambiguous to specify and easier to parse. Text tends to be verbose, and with json in particular, there's the repeating field names, which can add up with a tool like CMOC when debugging information is generated for the entire standard library. I also worry that a text format will give a false impression to tools vendors that it would be easier to hand-roll it themselves instead of using a library we provide which is guaranteed to do it right. There could be subtle things like which fields or sections are optional, or how to manage cross-references between sections, which could cause frustrating surprises down the road for the tool writer. A binary format is a signal to go down the (right) path of using a generator library instead. I would be curious to understand more your scenario of doing reverse engineering, and how that would benefit from a text format. My first thought would be to provide a simple tool that translates between a text and the binary format to give a human the ability to manually manipulate the file. (Though I wouldn't recommend build tools taking advantage of this.) |
@rb6502 :
Having separate source vs. disassembly stepping commands allow for scenarios where you're doing source-debugging, but temporarily want to step through individual assembly instructions (e.g., to understand what the compiler just did, or to diagnose a potential issue with the compiler). I've seen modern debuggers handle this by choosing the stepping type based on which window is active (thus, the MAME keyboard & menu commands adapt in the same way, based on what the main window is showing). For purposes of use of console or scripting, we'd need an unambiguous way to specify what type of stepping is requested. |
Yeah, as @ajrhacker mentioned, some tools simply don't provide sufficient information in any form to enable source-level debugging. The idea of adopting an existing format, such as ELF object files (and specifically the DWARF debugging format embedded inside them) is something which I've also considered and am not recommending. A quick summary why:
Overall, one of my goals is to make it as easy as possible for tools to participate. DWARF introduces significnant obstacles, without much upside. |
Curious if anyone has started reviewing the code, or if there's anything I can do to make a review more manageable? |
It's been 4 weeks, but it doesn't look like a review has started. I'm willing to put in more work to move this forward, both on this PR and on the tools side. Is it possible to start a conversation with a reviewer? Or is there something about the idea of source-level debugging that is fundamentally incompatible with MAME's vision? |
There appears to be conflicts with your branch and the main branch that need to be fixed. Once those are sorted out, you can have the dev team review it. |
The conflicts you saw had just arisen after a commit March 31, almost 4 weeks after I opened this pr (the pr was opened clean). I just fixed those conflicts, but new conflicts will likely crop up as long as the pr is open. As I mentioned I'm willing to put in a lot of additional work on this, but it would help if I knew whether the idea of source-code-level debugging is something the owners would even consider. |
As I said above, I'm personally enthusiastic about having source-level debugging in MAME. Standalone computer and even console emulators are increasingly using those kinds of capabilities as differentiators and we're fairly well positioned to have it across a lot of platforms. Speaking for myself, I would have preferred MAMEdev be in the conversation much earlier instead of having a big pile of code dropped on us at once. I dislike doing full-stack code reviews at work too, it's nothing personal :-) I do think this might be a good candidate for something I've been thinking about recently: a staging repo similar to WINE where we can put things that we're not comfortable dropping directly into master but would like to see get there and hopefully get more eyes and testing on first. @galibert and @cuavas thoughts? |
Thanks for the message, and yeah, I realize I'm kind of hoisting a whole bunch of work (unannounced!) on the owners. 😵 I figured a complete, working PR might be a more compelling prospect than trying to start a conversation without much to show for it. But of course this has its own disadvantages! If there's anything I can do to lighten the load, like an audio call (Discord?), or a set of testable scenarios you can run, etc., please let me know. I'm also open to a staged release, etc. |
I agree that it would be better to support elf/dwarf.
Even if you are using tools that don't support elf output, writing a
converter to output elf would be a better use of your time than a custom
binary format only used by MAME.
…On 10/03/2025 17:35, ajrhacker wrote:
I don't think storing debugging information in binary object files is
the best solution here, but if that's to be supported, I think it
would be better to use an existing format like ELF that provides
robust debugging support for a wide range of architectures.
|
In case you missed them, I listed some of my concerns about using DWARF above. I'm curious what your thoughts are about those? I'm open to doing whatever we think makes the most sense for MAME as a whole. When we chatted about source-level debugging on Discord last year, a couple scenarios came up, including Z80 C compilers (SDCC’s z80 target) and reverse-engineering (Ghidra). SDCC’s DWARF support looks too limited to be applicable here, but Ghidra seems to have some kind of support through a plug-in. Is that what you have in mind, or are there other scenarios that's motivating an interest in DWARF? I'd like to read further on them. |
SUMMARY:
This feature is targeted at MAME debugger users who have access to original source code that is assembled or compiled for emulated machines (for example, developers of new games to run on emulated machines). Add the ability to view, set breakpoints in, and step through the original source code instead of just the disassembly. Add symbols from the original source to MAME’s symbol tables for expression evaluation. Mostly useful for earlier 8-bit machines, tested with Tandy CoCo 2 and 3.
An early video I sent around the MAME Discord demonstrates what this looks like: https://youtu.be/2tu4t2bBjzo
COMPONENTS:
IMPLEMENTATION DETAILS:
GUI: new menu items to toggle between showing the source and showing the disassembly inside the main console debugger window. Free-floating disassembly windows remain unchanged. When source is shown, pre-existing keyboard shortcuts for stepping or setting breakpoints automatically invoke the corresponding source-level commands. When disassembly is shown in the main console debugger window, those shortcuts revert back to the old disassembly stepping commands.
Source level stepping: implementation reuses the corresponding disassembly stepping command, but with “slipping” at the end to ensure the stepping ends at a reasonable location in the source.
Symbol tables: add 2 new symbol tables, one for local variables from source, and one for global variables from source, chained in front of the pre-existing CPU and global symbol table. Support case sensitive symbol lookup, falling back to case insensitive symbol lookup as necessary. Source-level symbols, when present, eclipse any conflicting pre-existing symbols, but syntax is provided (“ns\”) to allow users to force references to pre-existing symbols.
File format: source level debugging information is stored in .mdi files. These act as containers, which can theoretically house different underlying formats, though only one format (“simple”) is supported so far in this pull request. I am familiar with 6809 and the TRS-80 CoCo, and believe this simple format is sufficient for that machine. I expect it will be sufficient for other similar machines and processors. I also allow for the possibility that experts in other machines might know of other (possibly incompatible) needs. Thus, .mdi files can be used to store other formats that better support fundamentally different machines. Inside the debugger implementation is an internal interface that can be used to read this simple format, and can be extended to read new formats that might be invented as necessary. The goal is to keep as small a quantity of debugger code as possible format-specific, with the remainder of the debugging code simply querying the interface without any knowledge of the underlying format.
File format library (mame_srcdbg_[static/shared]): this library is intended to be consumed by cross assemblers and cross compilers that target emulated machines. I have tested it so far with a 6809 assembler (lwasm / lwlink), a 6809 C compiler (CMOC), and a multi-platform basic compiler (ugBasic, though only the 6809-targeting compiler so far). The library is a C++ library with a pure C interface, so it can be consumed by tools written in either C or C++. Both a static and shared version of the library are built, so in theory even non C/C++ tools could dynamically load and call into the shared library, assuming the language supports that. MAME itself and the srcdbgdump tool link to the static version of this library for reading the format.
TODOs: I intentionally left a couple TODOs in the code for discussion with the reviewers.