Skip to content

CharReader: Add StructuredError #1281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

KyleFromKitware
Copy link

@KyleFromKitware KyleFromKitware commented Apr 6, 2021

The move from Json::Reader to Json::CharReader omitted the ability to get structured error messages from the parser. Clients which want to extract the line and column information have to parse the returned string. Add a new getStructuredErrors() method which exposes the error as structured data instead of a string.

Copy link
Contributor

@BillyDonahue BillyDonahue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OurReader already has a getStructureErrors(), it's just not exposed through the Json::CharReader interface.

It's expected and true that parse modifies the reader, as it's a non-const member function.
So after a parse call the CharReader is holding metadata about the parse, including the structured errors.

I would propose a much smaller change here where we don't add a parse overload, but add a getStructuredErrors accessor.

I feel like adding parse overloads is tricky because we'll end up with a Cartesian product of all the ways to specify inputs times all the possible kinds of output a caller might want, all as virtual functions. Currently parse(...) is not overloaded at all. Thee's just one virtual function operating on a char array. You can omit the errs if you don't want them but they provide a simple way to get at the getFormattedErrorMessages of the underlying OurReader.

I think what's missing from the API here are API calls that tunnel through to the underlying OurReader more directly. We have only parse currently. I feel like we could have getStructuredErrors as well.

@KyleFromKitware
Copy link
Author

I would propose a much smaller change here where we don't add a parse overload, but add a getStructuredErrors accessor.

This sounds fine to me, and would still accomplish what I want.

I do worry about this breaking binary compatibility. Source compatibility would be fine, but adding a new virtual method changes the vtable. GCC and its copycats place virtual methods in the vtable in their declared order, which means we can probably get away with adding the new virtual method to the end of the class declaration. However, MSVC reverses their order in the vtable, which could potentially break compatibility. For this, I see two options:

  1. Use some #ifdefs to change the order of the virtual methods depending on which compiler is used, or
  2. Bump the major version

@KyleFromKitware KyleFromKitware changed the title CharReader: Add ErrorInfo and parse() overload CharReader: Add StructuredError Apr 9, 2021
@BillyDonahue
Copy link
Contributor

I would propose a much smaller change here where we don't add a parse overload, but add a getStructuredErrors accessor.

This sounds fine to me, and would still accomplish what I want.

I do worry about this breaking binary compatibility. Source compatibility would be fine, but adding a new virtual method changes the vtable. GCC and its copycats place virtual methods in the vtable in their declared order, which means we can probably get away with adding the new virtual method to the end of the class declaration. However, MSVC reverses their order in the vtable, which could potentially break compatibility. For this, I see two options:

  1. Use some #ifdefs to change the order of the virtual methods depending on which compiler is used, or
  2. Bump the major version

It sounds like you might be saying that this is a unique concern of getStructuredErrors but maybe you aren't. Just so we're clear, your PR adds a virtual member to the CharReader interface too.

I don't think parse and getStructuredErrors need to be virtual functions at all, since the derived class is provided by the same shared library that is providing the CharReader pointer. The CharReader can just give an ordinary nonvirtual public interface, and the bodies of those functions, which are inside libjsoncpp.so, can do a virtual dispatch and not worry about binary compatibility. It's too late for parse, but we could add getStructuredErrors with that paradigm and not worry about ABI breaks.

@KyleFromKitware
Copy link
Author

KyleFromKitware commented Apr 9, 2021

It sounds like you might be saying that this is a unique concern of getStructuredErrors but maybe you aren't.

I am not. It is not unique to getStructuredErrors and would apply any time we add a new virtual function.

Just so we're clear, your PR adds a virtual member to the CharReader interface too.

Yes.

and the bodies of those functions, which are inside libjsoncpp.so, can do a virtual dispatch and not worry about binary compatibility.

Could you explain what you mean by this? I'm guessing it's something like:

std::vector<CharReader::StructuredError> CharReader::getStructuredErrors() const
{
  auto const* impl = dynamic_cast<OurCharReader*>(this);
  return impl->reader_.getStructuredErrors();
}

with perhaps a middle layer for accessing the private reader_.

@KyleFromKitware
Copy link
Author

By the way, I am also working on a PR which adds the ability to obtain the ptrdiff_ts of the string keys of an object (we already have getOffsetStart() and getOffsetLimit() for the value itself if it's a number or string etc.). That would change the size of Json::Value, which would necessitate a major version bump. Perhaps we could use that as an opportunity to clean up some of this virtual mess, as well as move all of Json::Value's members to an internal impl member to avoid having to major-bump whenever we want to add new information to Json::Value.

Thoughts?

@BillyDonahue
Copy link
Contributor

BillyDonahue commented Apr 9, 2021

It sounds like you might be saying that this is a unique concern of getStructuredErrors but maybe you aren't.

I am not. It is not unique to getStructuredErrors and would apply any time we add a new virtual function.

Just so we're clear, your PR adds a virtual member to the CharReader interface too.

Yes.

and the bodies of those functions, which are inside libjsoncpp.so, can do a virtual dispatch and not worry about binary compatibility.

Could you explain what you mean by this? I'm guessing it's something like:

std::vector<CharReader::StructuredError> CharReader::getStructuredErrors() const
{
  auto const* impl = dynamic_cast<OurCharReader*>(this);
  return impl->reader_.getStructuredErrors();
}

with perhaps a middle layer for accessing the private reader_.

Close but I would not do dynamic_cast of this, but rather hold a polymorphic _impl.
OurCharReader would implement a polymorphic interface and implement getStructuredErrors() directly.

So it would be:

CharReader {
public:
    std::vector<StructuredError> getStructuredErrors() const;

    /** apologetically virtual only for ABI compatibility. Not an extension point, so `final`. */
    virtual bool parse(etc...) final;

private:
    class Impl;  // Privately provides the polymorphic interface that CharReader provides today.
    std::unique_ptr<Impl> _impl;
};

// In cpp file to prevent inlining and control visibility.

class CharReader::Impl {
public:
    virtual ~Impl() = default;
    virtual bool parse(etc...) = 0;
    virtual std::vector<StructuredError> getStructuredErrors() const = 0;
};

auto CharReader::getStructuredErrors() const -> std::vector<StructuredError> {
    return _impl->getStructuredErrors();
}

bool CharReader::parse(etc...) {
    return _impl->parse(etc...);
}

@KyleFromKitware
Copy link
Author

Unfortunately, adding the _impl member would still break ABI compatibility by changing the size of CharReader, would it not?

@BillyDonahue
Copy link
Contributor

BillyDonahue commented Apr 9, 2021

Unfortunately, adding the _impl member would still break ABI compatibility by changing the size of CharReader, would it not?

That's true. We could solve that by making it a member of a compatibility union, the other member of that union would be a declaration of old layout, but we don't have to actually use it.

Wait nobody outside our .so knows how big a CharReader is. We only give pointers. So this isn't a problem.

@KyleFromKitware
Copy link
Author

Wait nobody outside our .so knows how big a CharReader is. We only give pointers. So this isn't a problem.

Not according to this simple test program:

#include <iostream>

#include <json/reader.h>

int main()
{
  std::cout << sizeof(Json::CharReader) << std::endl;
  return 0;
}

My machine shows it as 8, which probably consists entirely of a pointer to the vtable.

@BillyDonahue
Copy link
Contributor

BillyDonahue commented Apr 9, 2021

Wait nobody outside our .so knows how big a CharReader is. We only give pointers. So this isn't a problem.

Not according to this simple test program:

#include <iostream>

#include <json/reader.h>

int main()
{
  std::cout << sizeof(Json::CharReader) << std::endl;
  return 0;
}

My machine shows it as 8, which probably consists entirely of a pointer to the vtable.

I'd say the sizeof applied to an abstract base to be only a de jure ABI compatibility issue.
Nobody can declare a CharReader variable, so it's not like the size of any client-side object is changing.

I could clarify by saying that "nobody outside our .so knows how big any CharReader OBJECT is."

@KyleFromKitware
Copy link
Author

Nobody can declare a CharReader variable, so it's not like the size of any client-side object is changing.

That is a good point. Perhaps adding a member would be safe after all.

However, please see my proposal above regarding string keys which would certainly require breaking ABI compatibility. I'll open a separate issue for that, since it's orthogonal to this PR. But if we went through with it, it would give us a chance to clean up the virtual functions that don't need to be virtual.

@KyleFromKitware
Copy link
Author

I'll open a separate issue for that, since it's orthogonal to this PR.

Please see #1283.

@cdunn2001
Copy link
Contributor

While I understand and agree with your concerns on ABI compatibility for virtual methods, it's much more likely to be an issue on Linux than on Windows. I don't think we need to bump the major version number. Only the soversion counter.

meson.build:  soversion : 24,

People should not be using cmake, IMO, but if someone has a problem with ABI compatibility using cmake on Windows, they can rebuild the consuming code.

So I'm fine with your new virtual function as is.

All that said, I still like Billy's idea better. Please rebase this and modify according to @BillyDonahue 's suggestion. Then bump the minor version number. (I don't think we'll need to bump soversion in meson.build.)

@KyleFromKitware
Copy link
Author

Thanks. I'm a little busy with other things at the moment and might not get to this for a while, but I will come back when I can.

@baylesj
Copy link
Contributor

baylesj commented Sep 12, 2024

#1409

@baylesj baylesj closed this Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants