Debugger: New Feature: source-level debugging #13444

dave-br · 2025-03-05T00:01:05Z

SUMMARY:

This feature is targeted at MAME debugger users who have access to original source code that is assembled or compiled for emulated machines (for example, developers of new games to run on emulated machines). Add the ability to view, set breakpoints in, and step through the original source code instead of just the disassembly. Add symbols from the original source to MAME’s symbol tables for expression evaluation. Mostly useful for earlier 8-bit machines, tested with Tandy CoCo 2 and 3.

An early video I sent around the MAME Discord demonstrates what this looks like: https://youtu.be/2tu4t2bBjzo

COMPONENTS:

Create specification for a simple MAME Debugging Information File format with source mapping and symbol information.
Provide a library (mame_srcdbg_static.a, mame_srcdbg_shared.so/.dll) for cross assemblers and cross compilers that target emulated machines to easily generate MAME Debugging Information Files.
Add command-line options for loading MAME Debugging Information Files
Add command-line tool (srcdbgdump) for dumping MAME Debugging Information Files
Add debugger console commands for source-level stepping
Add source-level global symbols, local “fixed” symbols (which are scoped but have “fixed”, constant values), and local “relative” symbols (which are scoped and have values determined by an offset to a register, e.g., stack-local symbols) to the debugger symbol tables.
Add source-file + line number to the expression evaluator (e.g., for setting breakpoints)
Support runtime address offsetting for operating systems that relocate loaded code
Add GUI to Windows, Mac, and, QT debuggers to support source-level debugging. Thanks to @tlindner for writing Mac implementation.
Add sphinx documentation for entire feature (start reading at docs/source/debugger/general.rst)

IMPLEMENTATION DETAILS:

GUI: new menu items to toggle between showing the source and showing the disassembly inside the main console debugger window. Free-floating disassembly windows remain unchanged. When source is shown, pre-existing keyboard shortcuts for stepping or setting breakpoints automatically invoke the corresponding source-level commands. When disassembly is shown in the main console debugger window, those shortcuts revert back to the old disassembly stepping commands.

Source level stepping: implementation reuses the corresponding disassembly stepping command, but with “slipping” at the end to ensure the stepping ends at a reasonable location in the source.

Symbol tables: add 2 new symbol tables, one for local variables from source, and one for global variables from source, chained in front of the pre-existing CPU and global symbol table. Support case sensitive symbol lookup, falling back to case insensitive symbol lookup as necessary. Source-level symbols, when present, eclipse any conflicting pre-existing symbols, but syntax is provided (“ns\”) to allow users to force references to pre-existing symbols.

File format: source level debugging information is stored in .mdi files. These act as containers, which can theoretically house different underlying formats, though only one format (“simple”) is supported so far in this pull request. I am familiar with 6809 and the TRS-80 CoCo, and believe this simple format is sufficient for that machine. I expect it will be sufficient for other similar machines and processors. I also allow for the possibility that experts in other machines might know of other (possibly incompatible) needs. Thus, .mdi files can be used to store other formats that better support fundamentally different machines. Inside the debugger implementation is an internal interface that can be used to read this simple format, and can be extended to read new formats that might be invented as necessary. The goal is to keep as small a quantity of debugger code as possible format-specific, with the remainder of the debugging code simply querying the interface without any knowledge of the underlying format.

File format library (mame_srcdbg_[static/shared]): this library is intended to be consumed by cross assemblers and cross compilers that target emulated machines. I have tested it so far with a 6809 assembler (lwasm / lwlink), a 6809 C compiler (CMOC), and a multi-platform basic compiler (ugBasic, though only the 6809-targeting compiler so far). The library is a C++ library with a pure C interface, so it can be consumed by tools written in either C or C++. Both a static and shared version of the library are built, so in theory even non C/C++ tools could dynamically load and call into the shared library, assuming the language supports that. MAME itself and the srcdbgdump tool link to the static version of this library for reading the format.

TODOs: I intentionally left a couple TODOs in the code for discussion with the reviewers.

ajrhacker · 2025-03-10T14:44:54Z

To somewhat belatedly provide feedback on this, I have a really bad feeling about this new file format. It's a custom binary format which the MAME debugger itself can't create, so if support from other tools never flourishes, the whole thing stands to rot. The coupling to MAME's CPU state interface will also easily get worse considering the number of CPU types MAME supports.

Speaking from the standpoint of someone interested in reverse engineering, I would much prefer a text file format for enhanced debugging information. The increased parsing overhead for something like JSON should matter less in this context than the ability to easily create and edit files without specialized tools.

rb6502 · 2025-03-10T14:52:16Z

First off, source-level debugging is a top ask from Apple II users who run MAME or Ample, so I love that someone's done something with it.

By way of some initial feedback, I'm not a fan of having separate source and assembly step and run commands. The majority of debuggers automatically show source if it's available for the current program counter and assembly if not.

rb6502 · 2025-03-10T14:59:15Z

Regarding the file format, I do think it's important that MAME be able to just directly read whatever the assembler or compiler outputs for a given system, in much the same way that we prefer certain disk image formats but accept a wide variety of them.

ajrhacker · 2025-03-10T17:34:40Z

I do think it's important that MAME be able to just directly read whatever the assembler or compiler outputs for a given system, in much the same way that we prefer certain disk image formats but accept a wide variety of them.

The trouble with this analogy is that MAME does not literally "read whatever the assembler or compiler outputs for a given system" in most cases. The primary purpose of object files is for their sections to be linked together and loaded into memory. For virtually any object file format sophisticated enough to even allow for extra debugging information, that is going to be performed not by MAME itself but by the emulated operating system or monitor program.

I don't think storing debugging information in binary object files is the best solution here, but if that's to be supported, I think it would be better to use an existing format like ELF that provides robust debugging support for a wide range of architectures.

dave-br · 2025-03-10T17:49:47Z

@ajrhacker :

Yes, I hear you on the concern that tools-writers must choose to support the file format for the feature not to rot away. However we slice it, there are platforms with popular tools that do not generate sufficient information for source-level debugging (in any form), and they would need to change one way or another for this feature to work. My plan is, should this feature be accepted into MAME, focus first on the CoCo / Dragon tools I've already been prototyping for: lwtools assembler, ugBasic compiler, and CMOC (C-like) compiler. I would work on PRs for them which turn my prototype code into real code they'd be willing to accept. (the ugBasic developer already expressed interest in supporting this feature back in October). Once CoCo / Dragon have decent support sufficiently visible, I would hope that interest would start growing to other platforms, and I'd be happy to help out other interested platforms with integrating the support. Point being that I'm expecting to continue to actively build up this feature from the tools end, as that is crucial.

The coupling to MAME's CPU state interface will also easily get worse considering the number of CPU types MAME supports.

Are you referring to the register #defines? They can be de-coupled via a lookup table if we agree that's the right way to go. Sorry if I'm misinterpreting.

Speaking from the standpoint of someone interested in reverse engineering, I would much prefer a text file format for enhanced debugging information.

I have considered using text instead of binary for the format, and I generally tend to lean toward binary for a few reasons. Binary is unambiguous to specify and easier to parse. Text tends to be verbose, and with json in particular, there's the repeating field names, which can add up with a tool like CMOC when debugging information is generated for the entire standard library. I also worry that a text format will give a false impression to tools vendors that it would be easier to hand-roll it themselves instead of using a library we provide which is guaranteed to do it right. There could be subtle things like which fields or sections are optional, or how to manage cross-references between sections, which could cause frustrating surprises down the road for the tool writer. A binary format is a signal to go down the (right) path of using a generator library instead.

I would be curious to understand more your scenario of doing reverse engineering, and how that would benefit from a text format. My first thought would be to provide a simple tool that translates between a text and the binary format to give a human the ability to manually manipulate the file. (Though I wouldn't recommend build tools taking advantage of this.)

dave-br · 2025-03-10T17:56:25Z

@rb6502 :

By way of some initial feedback, I'm not a fan of having separate source and assembly step and run commands. The majority of debuggers automatically show source if it's available for the current program counter and assembly if not.

Having separate source vs. disassembly stepping commands allow for scenarios where you're doing source-debugging, but temporarily want to step through individual assembly instructions (e.g., to understand what the compiler just did, or to diagnose a potential issue with the compiler). I've seen modern debuggers handle this by choosing the stepping type based on which window is active (thus, the MAME keyboard & menu commands adapt in the same way, based on what the main window is showing). For purposes of use of console or scripting, we'd need an unambiguous way to specify what type of stepping is requested.

dave-br · 2025-03-10T18:16:54Z

@rb6502

Regarding the file format, I do think it's important that MAME be able to just directly read whatever the assembler or compiler outputs for a given system, in much the same way that we prefer certain disk image formats but accept a wide variety of them.

Yeah, as @ajrhacker mentioned, some tools simply don't provide sufficient information in any form to enable source-level debugging.

The idea of adopting an existing format, such as ELF object files (and specifically the DWARF debugging format embedded inside them) is something which I've also considered and am not recommending. A quick summary why:

DWARF is staggeringly complex. DWARF5 specification is 477 pages long, its reader library has over 300 API calls, and its writer library adds another 70 API calls plus callbacks. My PR's generator library has only 9 API calls.
"supporting DWARF" is an amibguous claim. DIEs have optional attributes, and there are multiple compression schemes. Some components of DWARF use a stack-based interpreted language to express values, and others use a finite state machine program (e.g., the line table encoding). MAME would need a way to clearly specify what portions of those are supported so that tools-writers know what to generate.
DWARF may actually not even be sufficient. Its specification defers to the ABI for some things, like register numbers, but there may be no ABIs for some vintage processors. I couldn't find anything formally specified for 6809 or 6502, for example, though there is what appears to be an unofficial fork of gcc for 6809.

Overall, one of my goals is to make it as easy as possible for tools to participate. DWARF introduces significnant obstacles, without much upside.

dave-br · 2025-03-18T16:02:06Z

Curious if anyone has started reviewing the code, or if there's anything I can do to make a review more manageable?

dave-br · 2025-04-01T16:30:37Z

It's been 4 weeks, but it doesn't look like a review has started. I'm willing to put in more work to move this forward, both on this PR and on the tools side. Is it possible to start a conversation with a reviewer? Or is there something about the idea of source-level debugging that is fundamentally incompatible with MAME's vision?

JimCarlTay · 2025-04-02T02:50:42Z

There appears to be conflicts with your branch and the main branch that need to be fixed. Once those are sorted out, you can have the dev team review it.

dave-br · 2025-04-02T21:01:27Z

There appears to be conflicts with your branch and the main branch that need to be fixed. Once those are sorted out, you can have the dev team review it.

The conflicts you saw had just arisen after a commit March 31, almost 4 weeks after I opened this pr (the pr was opened clean). I just fixed those conflicts, but new conflicts will likely crop up as long as the pr is open. As I mentioned I'm willing to put in a lot of additional work on this, but it would help if I knew whether the idea of source-code-level debugging is something the owners would even consider.

rb6502 · 2025-04-03T00:02:12Z

As I said above, I'm personally enthusiastic about having source-level debugging in MAME. Standalone computer and even console emulators are increasingly using those kinds of capabilities as differentiators and we're fairly well positioned to have it across a lot of platforms.

Speaking for myself, I would have preferred MAMEdev be in the conversation much earlier instead of having a big pile of code dropped on us at once. I dislike doing full-stack code reviews at work too, it's nothing personal :-)

I do think this might be a good candidate for something I've been thinking about recently: a staging repo similar to WINE where we can put things that we're not comfortable dropping directly into master but would like to see get there and hopefully get more eyes and testing on first. @galibert and @cuavas thoughts?

dave-br · 2025-04-03T20:49:50Z

Thanks for the message, and yeah, I realize I'm kind of hoisting a whole bunch of work (unannounced!) on the owners. 😵 I figured a complete, working PR might be a more compelling prospect than trying to start a conversation without much to show for it. But of course this has its own disadvantages! If there's anything I can do to lighten the load, like an audio call (Discord?), or a set of testable scenarios you can run, etc., please let me know. I'm also open to a staged release, etc.

smf- · 2025-04-05T13:12:04Z

I agree that it would be better to support elf/dwarf. Even if you are using tools that don't support elf output, writing a converter to output elf would be a better use of your time than a custom binary format only used by MAME.

…

On 10/03/2025 17:35, ajrhacker wrote: I don't think storing debugging information in binary object files is the best solution here, but if that's to be supported, I think it would be better to use an existing format like ELF that provides robust debugging support for a wide range of architectures.

dave-br · 2025-04-06T21:34:18Z

In case you missed them, I listed some of my concerns about using DWARF above. I'm curious what your thoughts are about those? I'm open to doing whatever we think makes the most sense for MAME as a whole. When we chatted about source-level debugging on Discord last year, a couple scenarios came up, including Z80 C compilers (SDCC’s z80 target) and reverse-engineering (Ghidra). SDCC’s DWARF support looks too limited to be applicable here, but Ghidra seems to have some kind of support through a plug-in. Is that what you have in mind, or are there other scenarios that's motivating an interest in DWARF? I'd like to read further on them.

dave-br added 2 commits March 4, 2025 13:32

source-level debugging feature as single commit

a558184

ran srcclean

5105c03

dave-br added 2 commits April 2, 2025 10:40

Merge branch 'main' into pr-sd

fc1196c

Fix merge mistake (duplicate dvwpoints.h,cpp)

6d3ca61

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debugger: New Feature: source-level debugging #13444

Debugger: New Feature: source-level debugging #13444

dave-br commented Mar 5, 2025

ajrhacker commented Mar 10, 2025

rb6502 commented Mar 10, 2025

rb6502 commented Mar 10, 2025

ajrhacker commented Mar 10, 2025

dave-br commented Mar 10, 2025

dave-br commented Mar 10, 2025

dave-br commented Mar 10, 2025

dave-br commented Mar 18, 2025

dave-br commented Apr 1, 2025

JimCarlTay commented Apr 2, 2025 •

edited

Loading

dave-br commented Apr 2, 2025

rb6502 commented Apr 3, 2025 •

edited

Loading

dave-br commented Apr 3, 2025

smf- commented Apr 5, 2025 via email

dave-br commented Apr 6, 2025

Debugger: New Feature: source-level debugging #13444

Are you sure you want to change the base?

Debugger: New Feature: source-level debugging #13444

Conversation

dave-br commented Mar 5, 2025

ajrhacker commented Mar 10, 2025

rb6502 commented Mar 10, 2025

rb6502 commented Mar 10, 2025

ajrhacker commented Mar 10, 2025

dave-br commented Mar 10, 2025

dave-br commented Mar 10, 2025

dave-br commented Mar 10, 2025

dave-br commented Mar 18, 2025

dave-br commented Apr 1, 2025

JimCarlTay commented Apr 2, 2025 • edited Loading

dave-br commented Apr 2, 2025

rb6502 commented Apr 3, 2025 • edited Loading

dave-br commented Apr 3, 2025

smf- commented Apr 5, 2025 via email

dave-br commented Apr 6, 2025

JimCarlTay commented Apr 2, 2025 •

edited

Loading

rb6502 commented Apr 3, 2025 •

edited

Loading