Skip to content

Design a general JsonElement-like thing? #1573

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ArcticLampyrid opened this issue Jun 26, 2021 · 19 comments
Open

Design a general JsonElement-like thing? #1573

ArcticLampyrid opened this issue Jun 26, 2021 · 19 comments
Labels

Comments

@ArcticLampyrid
Copy link

Currently the JsonElement is very specific to Json, however the similar feature can be used in other formats like Ctor, Yaml, MessagePack, etc.

What is your use-case and why do you need this feature?
Similarly to JsonElement, sometimes we may need to process the complex struct manually.
We may provide response for our api in multi formats (eg. Json for normal use & Ctor for better performance), then we need a more general way to handle xxxElement.

For a detailed example, all Discord APIs support both Json and ETF encodings.

ENCODING is the type of encoding for this connection to use. json and etf are supported. (Original)

Describe the solution you'd like
The fundamental GeneralElement should be designed as RawMessage (inspired by Golang), which stores the values in a serializator-special binary format. This data may not have a clear TypeID, so we may not able to read or edit it without addtional information. But round-trip should be supported.
RawMessage is compatible to Protobuf or other formats that do not have a clear runtime type info.

Some formats like Json or Ctor provides detailed type info to help us parse the data, then we can parse them into GeneralObjectElement, GeneralArrayElement or GeneralPrimitiveElement, which generalizd JsonObject, JsonArray and JsonPrimitive.

@altavir
Copy link

altavir commented Jun 26, 2021

This one is similar: #222

@ArcticLampyrid
Copy link
Author

ArcticLampyrid commented Jun 26, 2021 via email

@altavir
Copy link

altavir commented Jun 26, 2021

It is about a generic tree-like object with string keys. I actually use exactly that in DataForge. But it is tailored for my own needs. I use it for DOM as well and it could be encoded to Json or XML.

@elizarov
Copy link
Contributor

elizarov commented Jun 26, 2021

My 5 cents. All formats listed here: YAML, Cbor, MessagePack are based on JSON data information model, so using a JSONElement for them is totally appropriate. You cannot just call this thing GeneralElement, because it is NOT a general element. For example, XML Data Information Model is substantially different from the JSON data model and should use its own XMLElement with a different design, that would not fit well JSON-based formats.

@ArcticLampyrid
Copy link
Author

@elizarov
GeneralElement can be designed as RawMessage, just to provide a placeholder for round-trip. It's suitable for XML.
GeneralObjectElement/GeneralArrayElement may be more json-special. But a very large part of formats have the concept of Map and Array, that's why it's tolerable to be designed generically.

What's more, the current architecture of kotlinx.serialization has had a preference to formats based on Map&List.

@altavir
Copy link

altavir commented Jun 26, 2021

@elizarov the primary difference between JSON and XML is the treatment of same-name-siblings. In Json it is array, in XML, element order matters. I agree that the model is different, but in theory, it is possible to make some kind of common tooling to work with any tree-like structure.

@ArcticLampyrid
Copy link
Author

For example, XML Data Information Model is substantially different from the JSON data model and should use its own XMLElement with a different design, that would not fit well JSON-based formats.

We can design a XmlNodeElement here as a subclass of GeneralElement for XML model. GeneralElement is at least useful for round-trip.
And even we change the name GeneralElement to GeneralObjectModelElement, it's still useful in many cases.

@ArcticLampyrid
Copy link
Author

GeneralElement can be designed as RawMessage, just to provide a placeholder for round-trip. It's suitable for XML.

In a concrete example:

data class Request(val identity: GeneralElement, val arg: String)
data class Response(val identity: GeneralElement, val result: String)

fun handleRequest(request: Request): Response {
    return Response(request.identity, "Hello, " + request.arg)
}

@aSemy
Copy link
Contributor

aSemy commented Feb 21, 2023

As an incremental step it would be very helpful to break out the JsonElement classes and utility functions into a separate, independent, dependency.

This would help json5k xn32/json5k#2 so it's possible to encode/decode to/from JsonElements, but does not require muddying the JSON5 code with KxS JSON serialization.

@sandwwraith
Copy link
Member

@aSemy I doubt this is worth it or even possible, because JsonElements serializers heavily rely on functionality specific to Json format (see JsonEncoder and JsonDecoder)

@JesusMcCloud
Copy link
Contributor

(This comment is very much related to #222 )

We keep hitting issues due to the limitations of the decoder (see here, for example, but we've also faced this before and the ISO mDL spec already requires us to use an alternative to the upstream KxS CBOR codec for similar reasons).
The pattern is always the same: Without a way to parse CBOR into a generic data structure, we will only ever by monkey-patching issues where the decoding of one property depends on the value of another, which may only occur later on.

Let's face it: There are specs out there that demand parsing onto generic structures. These specs are

  • already being used in production
  • will hit production by 2026

Just looking at the EUDIW this amounts the whole population of the European Union being eligible for digital identity documents, that require parsing CBOR into a generic tree structure.
In effect, this means that the default CBOR serialization format is not an option. Luckily, there is Obor, that does just that.
While it would be possible to upstream the Obor feature set, it would effectively mean to scrap the whole CBOR codec of KxS, at which point, just switching to Obor is probably the way to go.

I am not trying to badmouth the default KxS format, or anyone involved with it. I am just raising the point that a feature that used to be nice-to-have has become essential for real-world applications that are here to stay, affecting a user base of hundreds of millions.

@pdvrieze
Copy link
Contributor

Looking just at the difference between XML and Json from a format perspective I don't think it is even possible to have a generic data structure that could be a format agnostic general data storage. The "easier" case is serialization, but it would still requiring providing the descriptor on replay to allow for a serializer to "replay" the events. Needing the descriptor is not generic. Deserialization is still more challenging.

Keep in mind that formats use the serial descriptors to determine how to serialize/interpret deserialized content. Maybe an extreme example of that is protobuf where nested objects are just byteranges and the schema is needed to interpret it.

@JesusMcCloud
Copy link
Contributor

JesusMcCloud commented Apr 1, 2025

Too generic probably won't work, but as @elizarov pointed out, CBOR and Json share a lot, but CBOR is more powerful, so unifying at least those two formats would make sense in the long run.

Feature-Wise and structurally, I'd say JSON is a strict subset of CBOR.

@ArcticLampyrid
Copy link
Author

I don't think it is even possible to have a generic data structure that could be a format agnostic general data storage.

Even so, designing a generic component solely for JSON-like models is still practical enough.

In the real world, we already have multiple formats that adopt JSON-like models - such as YAML/Cbor/Message Pack/TOML, etc., which are widely used.

@pdvrieze
Copy link
Contributor

pdvrieze commented Apr 1, 2025

@JesusMcCloud @ArcticLampyrid There is nothing against the idea to have shared generic components. Thinking about the implementation it requires either the custom serializers to each specific supported format (e.g. using JsonEncoder), or to have a shared interface (e.g. GeneralStructuredEncoder). If it is in line with JsonEncoder/JsonDecoder this would require each format to allow for parsing the content to such a structure.

@sandwwraith
Copy link
Member

@JesusMcCloud There's a big difference between 'Design a general generic tree structure' and 'Expand JsonElement to be a CborElement' tasks. The first one has a very broad scope with a lot of design space and unanswered questions. The second has a clear scope and applications. This ticket is left open for more theoretic discussions; I doubt it would be implemented someday.

For CborElement, though, you can open a separate one with a list of examples from EU regulations. We can discuss there whether some parts of Obor can be upstreamed or some other solution is required.

@JesusMcCloud
Copy link
Contributor

We have this bug here and we have #222. I agree that there are things being mixed up, but it was hinted here that combining json an cbor generic structures would make sense. So the new issue will be about which scope, exactly? JSON+CBOR is a given, YAML? what else?

@JesusMcCloud
Copy link
Contributor

JesusMcCloud commented Apr 7, 2025

I think this needs some guidance on how to progress without cluttering the issue tracker even more. We currently have:

The last bullet deserves a separate issue, as it is COSE-specific and comes down to the ISO watering-down the strict COSE encoding rules. I will need some time to properly prepare it though, because:

  • While the COSE spec and the PID credential spec are freely available, I need to check with @nodh whether the relevant parts of the ISO spec are freely available too (this bit probably is, but a copy of each part of the SO/IEC 18013 spec costs real money and must not be made public).
  • I will need to prepare concrete examples and a write-up why the current state causes problems and why the single-byte lookahead is not adequate

ping @thecyberfred

@JesusMcCloud
Copy link
Contributor

Write-up is done. see #2975

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants