-
-
Notifications
You must be signed in to change notification settings - Fork 52
GSoC: Define upgrade/downgrade language agnostic declarative transformation rules for all JSON Schema dialects #599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks Juan. This looks amazing! |
Hey @jviotti I read through the problem statement, I loved the way the description was put through giving a good understanding. I would love to work on this problem statement under GSOC and the mentors. Can you guide me through more understanding regarding this..😁 and where to start with |
Hey there! I'd first suggest getting acquainted with https://github.com/sourcemeta/alterschema. This is the original project where I prototyped something like what we want to do here, using JSON-e (https://json-e.js.org), but ended up hitting some blockers. You can take a look at all the upgrade transformation rules I support here: https://github.com/sourcemeta/alterschema/tree/master/rules. Try to read them, and understand them mainly in conjunction with JSON Schema's official migration guide: https://json-schema.org/specification#migrating-from-older-drafts. The way Alterschema work is pretty simple. It will recursively traverse through every subschema of the given schema in a top-down manner, applying all the rules it knows about to every subschema over and over again until no more transformation rules can be executed. The core business logic of it its literally a small JavaScript file: https://github.com/sourcemeta/alterschema/blob/master/bindings/node/index.js For example, Alterschema rules for upgrading JSON Schema 2019-09 to 2020-12 are defined here: https://github.com/sourcemeta/alterschema/blob/master/rules/jsonschema-2019-09-to-2020-12.json, based on what JSON Schema published here: https://json-schema.org/draft/2020-12/release-notes. Now, what we would like to do in this GSoC initiative is learn from what we did in Alterschema to do another take on the problem that solves the limitations of Alterschema. The main limitation is this one: sourcemeta-research/alterschema#43. In summary, a JSON Schema may reference other parts of itself using URI encoded JSON Pointers along with the However, what happens if there is a reference in another other part the schema that is now invalid after the schema transformation you did somewhere else? If so, we don't have a deterministic way of detecting this, even less know how to "fix up" the reference pointers. The conclusion I got from this is that JSON-e, while powerful, is too low level and doesn't carry semantics about what the transformation actually did. For example, if you upgrade So what I'm thinking about is that we can study the transformation rules that we want to perform, and break them down into higher level sub transformations. For example, are you completely deleting something? Are we performing just a rename? Are we moving the contents around? If we design a JSON language that works at a higher level of abstraction, we can deterministically know how we should fix any affected pointer. |
So I'd say the phases in this project are like this:
|
As an initial qualifying task for this project (cc @benjagm), I propose:
|
As a more specific (though probably a bit artificial and silly 😅) example of the {
"$schema": "https://json-schema.org/draft/2019-09/schema",
"type": "array",
"items": [
{ "type": "string" },
{ "type": "number" }
],
"additionalItems": {
"$ref": "#/items/0"
}
} To turn it into a JSON Schema 2020-12, we need to:
However, if you blindly perform these transformations, you would end up with the following schema: {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "array",
"prefixItems": [
{ "type": "string" },
{ "type": "number" }
],
"items": {
"$ref": "#/items/0"
}
} However note that the This one is a bit simple, but think about more complex variations of the same problem. You might have long references where many of its components will need to be updated, and in some cases, it will be more than just a component rename. |
Or if you can think of a better way to deterministically solve this problem, please propose it and we can work on it together! |
I'm confused by this line. Are we supposed to convert prefixItems to items for the reference to be #/prefixItems/0 as part of the conversion from 2019-09 to 2020-12? Perhaps you meant items to prefixItems, or maybe I am misunderstanding? 😕 |
@MeastroZI The reference was originally {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "array",
"prefixItems": [
{ "type": "string" },
{ "type": "number" }
],
"items": {
"$ref": "#/prefixItems/0"
}
} |
Hasn't this problem already been addressed with the pattern "pattern": "/items/\\d+"
"$eval": "replace(schema['$ref'], '/items/(\\d+)', '/prefixItems/$1')" or is there a possibility that this approach might not cover all cases? If so, could you please specify which cases it might not handle, so I can gain a better understanding of the issue? |
@MeastroZI For this very trivial rename case yes, but it's very easy to construct valid JSON Schemas where that simple pattern won't do. Take this one as a silly example: {
"$schema": "https://json-schema.org/draft/2019-09/schema",
"type": "object",
"properties": {
"items": {
"items": [
{ "type": "string" }
]
},
"extra": {
"$ref": "#/properties/items/items/0"
}
}
} It has an object property called In any case, Keep in mind that a tool that upgrades schemas must be able to handle ANY valid JSON Schema document that the user passes to it, and handle these tricky edge cases accordingly. |
For i.e. |
Here is a fun one that is valid and breaks the {
"$schema": "https://json-schema.org/draft/2019-09/schema",
"type": "object",
"properties": {
"foo": {
"$ref": "#/$defs/items/0"
}
},
"$defs": {
"items": {
"0": {
"type": "string"
}
}
}
} |
What I'm thinking about is that we can statically analyze the schema first, and know what each component of the pointers mean (i.e. does the |
|
Okay ill complete this rn
…On Sat, 24 Feb 2024 at 1:59 AM, Juan Cruz Viotti ***@***.***> wrote:
What I'm thinking about is that we can statically analyze the schema
first, and know what each component of the pointers mean (i.e. does the
/items part of #/$defs/items correspond to the actual items 2019-09
applicator in array form?) That plus additional semantics around what the
transformation does could help us resolve every case
—
Reply to this email directly, view it on GitHub
<#599 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASS4PJ5QFZKIGUM3HXQQUOLYVD33ZAVCNFSM6AAAAABCRLXYHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRRHE2DSOBRGY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Not 100% sure what you mean, but what I mean by semantics is being able to statically analyze the actual transformation DSL and actually understand what it does. For example, you cannot very easily tell from a JSON-e template that such template is actually a property rename. And if we can tell that i.e. a rule is actually a rename for A to B, then we might know how to handle the reference fix ups. Coming back to the {
"$merge": [
{ "$eval": "omit(schema, 'items')" },
{
"prefixItems": {
"$eval": "schema.items"
}
}
]
} What if instead of that weird-looking low-level complex JSON template, we instead had: [
{ "type": "rename", "from": "items", "to": "prefixItems" }
] The latter is a LOT more machine readable. I guess the main challenge is that leaving the simple |
@jviotti one question in this: Should the high level transformation language call the JSON-e at the backend or can say(should the high level one be written on top of JSON-e itself)? |
@Era-cell Maybe. I'm open to both building it on top of JSON-e or as a standalone thing. Whatever is easier I guess |
Thanks a lot for joining JSON Schema org for this edition of GSoC!! Qualification tasks will be published as comments in the project ideas by Thursday/Friday of this week. In addition I'd like to invite you to a office hours session this thursday 18:30 UTC where we'll present the ideas and the relevant date to consider at this stage of the program. Please use this link to join the session: See you there! |
For the qualifying task, just to echo back what I said before: the main thing we want to see on proposals is that you have a good grasp on what the problem of upgrading JSON Schemas is and are capable of understanding the upgrade rules that would need to be implemented. So for that, you can focus only on 2019-09 to 2020-12 for the proposal (we'll cover other drafts later), list down the transformation rules that need to happen on all those drafts, and try to categorize them based on different criteria to understand them better. For example, what vocabulary they involve, what type of operation they are (rename, wrap, etc), whether they affect other sibling or non sibling keywords, etc. Be creative! Good grouping criteria can surface patterns that we might not be thinking about and that could influence the DSL. You can present this as a spreadsheet, list, or any form you want. Then, once accepted, we will continue building up on this analysis to design the DSL, and finally implement it. If we did the previous phases well (mainly the one one understanding and categorizing the transformation rules), the rest will be easy |
{
"$schema": "https://json-schema.org/draft/2020-12",
"$id": "https://example.com/anotherthing/agains/customer",
"type": "object",
"properties": {
"name": { "type": "string" },
"phone": { "$ref": "/schema/common#/$defs/phone" },
"address": { "$ref": "/schema/address" }
},
"$defs": {
"https://example.com/schema/address": {
"$id": "https://example.com/schema/address",
"type": "object",
"properties": {
"address": { "type": "string" },
"city": { "type": "string" },
"postalCode": { "$ref": "/schema/common#/$defs/usaPostalCode" },
"state": { "$ref": "#/$defs/states" }
},
"$defs": {
"states": {
"enum": [4, 4]
}
}
},
"https://example.com/schema/common": {
"$schema": "https://json-schema.org/draft/2019-09",
"$id": "https://example.com/schema/common",
"$defs": {
"phone": {
"type": "number"
},
"usaPostalCode": {
"type": "string",
"pattern": "^[0-9]{5}(?:-[0-9]{4})?$"
},
"unsignedInt": {
"type": "integer",
"minimum": 0
}
}
}
}
}
@jviotti I am not able to understand how, in this case, this "phone": { "$ref": "/schema/common#/$defs/phone" } which has the relative path, gets resolved by the schema validator. I mean, how is the base URL for this calculated even if there is nothing common in the relative path under $ref and the $id of the root? |
Did you try to run it? I am thinking this is related to how schemas are stored |
@Era-cell, I have read somewhere that $ref is resolved by directly pointing to the schema part they are referring to. So now my question is: how does the schema validator resolve this $ref with a relative path? Even if the schema validator stores these schemas in the definition part or in some other way under the hood , there is still a need to resolve it by referencing it and resolving $ref. |
As per documentation: refs are encapsulated from parent schema but defs
aren't so annotation results of external achema should effect only
validation results. If sub-schema with $ref fails schema is invalidated
…On Thu, 29 Feb 2024 at 1:39 PM, Vinit Pandit ***@***.***> wrote:
@Era-cell <https://github.com/Era-cell>, I have read somewhere that $ref
is resolved by directly pointing to the schema part they are referring to.
So now my question is: how does the schema validator resolve this $ref with
a relative path? Even if the schema validator stores these schemas in the
definition part, there is still a need to resolve it by referencing it and
resolving $ref.
—
Reply to this email directly, view it on GitHub
<#599 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASS4PJZAM37KOLAJAE3HU73YV3Q3BAVCNFSM6AAAAABCRLXYHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZQGYYTMNJXGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The schema I provided is not invalidating; it's working and successfully validating the JSON data. You can try it here: Edited: Sorry, I am typing from my phone, so may you face typos in my messages |
@MeastroZI Your reference, According to JSON Schema use of URI and the URI RFC, that relative URI is resolved taking Following standard URI behavior, the result of resolving If URI behavior is the confusing part, I recommend reading the URI RFC: https://www.rfc-editor.org/rfc/rfc3986 |
const transformRule = [
{
referencTraverser: true,
path: "properties/*",
conditions: [{ "isKey": "$ref" }],
refConditions: [{ "isKey": "items", "hasSibling": ["type", "array"] }],
updateRefPart: "prefixItems"
},
{
path: '*',
conditions: [{ "isKey": "items", "hasSibling": ["type", "array"] }],
operations: {
"editKey": "prefixItems"
}
} ,
{
path : '$schema' ,
operations : {
"updateValue" : "https://json-schema.org/draft/2020-12/schema"
}
}
]
const jasonobj = {
"$schema": "https://json-schema.org/draft/2019-09/schema",
"type": "object",
"properties": {
"items": {
"type": "array",
"items": [
{ "type": "string" }
]
},
"extra": {
"$ref": "#/properties/items/items/0"
}
},
"ooos": {
"items2": {
"type": "array",
"items": []
},
"item3": {
"items4": {
"items5": {
"type": "array",
"items": []
}
}
}
}
}
const result = convert(transformRule, jasonobj)
console.log('\n')
console.log('*******************************Logs*****************************************')
console.log('\n\n\n\n\n\n')
console.log('*******************************Result****************************************')
console.log( JSON.stringify (result , null , 2))
console.log('*******************************Result****************************************')
console.log('\n') and here is the output {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"items": {
"type": "array",
"prefixItems": [
{
"type": "string"
}
]
},
"extra": {
"$ref": "#/properties/items/prefixItems/0"
}
},
"ooos": {
"items2": {
"type": "array",
"prefixItems": []
},
"item3": {
"items4": {
"items5": {
"type": "array",
"prefixItems": []
}
}
}
}
} Hi @jviotti, I have a doubt about the meaning of the JSON DSL. Could you please take a look at this code? It's a snippet of my work towards DSL. Actually, I want to know if my code can do something like this. Is it considered as a DSL? If not, how would you technically define a DSL? And sorry for the previous comment. One more thing I am hesitant about is asking this many questions. Is it okay to ask this many questions or are they silly? I want to openly express my concern about it. |
@Era-cell
The |
@Era-cell
These perform simplifications within the same version to make it easier to process the other rules. i.e. you could simplify the use of certain keywords on the input schema without changing the version, before you attempt to upgrade it.
The whole point of this project is to make rule definitions programming language agnostic. We don't want to just create an upgrade tool for JavaScript, but one that is embeddable and implementable on ANY language out there. That's why the rules are pure JSON.
Not sure I follow this. Can you give me an example?
I will. The idea is for the JSON-based rules to be moved to the JSON Schema org while Alterschema is (one of many, potentially?) an implementation of the actual engine. |
@jviotti |
Correct. Maybe this example helps clarifying that: https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/main/tests/draft2020-12/unevaluatedItems.json#L64-L78 |
|
@Era-cell
It should be all JSON based. No need for a new grammar. Just use JSON's grammar. But don't embed an actual programming language like JavaScript on the JSON. JSON-e is one valid way of doing it. It expresses the transformations purely using JSON. |
Hi, @jviotti when the algorithm/DSL will be included in JSON Schema org, will the access to external json schema documents be provided,
whose schema resource isnt present in the document which is being altered, at this point the external schema document(which is external resource) also needs to be altered? |
Hi @Era-cell
Great question! Yes on both cases:
But in both cases, our upgrader shouldn't really mind. If its passed a schema with unresolved remote references, it will do what it can, and if its passed a bundled schema, it will transform the entire thing. |
"Hi, @jviotti! I have one more question about bundling schemas. Can I assume that the name(key) of the schema in $def will always be an $id of that schema, or it can be anything? For example, in this schema under the $def, the names are set to the $id of the schema:" {
"$id": "https://jsonschema.dev/schemas/examples/non-negative-integer-bundle",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"description": "Must be a non-negative integer",
"$comment": "A JSON Schema Compound Document. Aka a bundled schema.",
"$defs": {
"https://jsonschema.dev/schemas/mixins/integer": {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://jsonschema.dev/schemas/mixins/integer",
"description": "Must be an integer",
"type": "integer"
},
"https://jsonschema.dev/schemas/mixins/non-negative": {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://jsonschema.dev/schemas/mixins/non-negative",
"description": "Not allowed to be negative",
"minimum": 0
},
"nonNegativeInteger": {
"allOf": [
{
"$ref": "/schemas/mixins/integer"
},
{
"$ref": "/schemas/mixins/non-negative"
}
]
}
},
"$ref": "#/$defs/nonNegativeInteger"
}
|
It can be anything. |
(The value of the |
Okay, so if we have access to external resource and it is resolved.. we dont change the external schema, |
Keep in mind the project would not be able to "modify" any schema in place. What it does is create a copy of the input schema with the given transformations. So:
|
🚩 IMPORTANT INSTRUCTIONS REGARDING HOW AND WHERE TO SUBMIT YOU APPLICATION 🚩 Please join this discussion in JSON Schema slack to get the last details very important details on how to better submit your application to JSON Schema. See communication here. |
Hi, @jviotti where should the qualification task be submitted, and what is the deadline for it? |
@Era-cell I believe there is a GSoC portal that you should use. @benjagm Can you clarify? |
@Era-cell yes please. Make sure you add the details of the qualification task to the proposal! Feel free to join the |
Hi @jviotti, First of all, I apologize for using the Alterschima UI to display my DSL transformation. It's only temporary! Could you please review the transformation from 2019 to 2020 draft on this site? I've embedded the qualification tasks' DSL transformation code and have tried my best to cover all edge cases. However, if I've missed any, please let me know." |
@MeastroZI Not much I can comment on given a single example, but looking forward to the explanations, proposed rules, etc in the proposal! |
@jviotti, I submitted my proposal (Name: Pandit Vinit ) in Json schema. Could you please review it and provide any suggestions if possible ? |
I will, thanks a lot for the submission! ❤️ |
@jviotti in 2019-09 draft i am not able to find the any difference between additionalItems and unevaluatedItems {
"$schema": "https://json-schema.org/draft/2019-09/schema",
"$def": {
"stringArray": {
"type": "array",
"items": {
"type": "string"
}
},
"numberArray": {
"oneOf": [
{
"type": "array",
"items": [
{
"type": "number"
},
{
"$ref": "#/$def/stringArray"
}
]
},
{
"type": "boolean"
}
]
}
},
"type": "array",
"items": [
{
"$ref": "#/$def/stringArray"
}
],
"additionalItems": {
"$ref": "#/$def/numberArray"
}
}
validate against : [[""] , [5 , [""]] ] and [[""] , true ] so my question is what is the difference between additionalItems and unevaluatedItems in 2019-09 draft and is there any example schema which show the difference between additionalItems and unevaluatedItems ? |
@MeastroZI Take a look at the official test suite examples: https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/main/tests/draft2019-09/unevaluatedItems.json. |
"@jviotti, I need direction to think on how to approach downgrading of JSON schema. Is it even possible to do this for all the dialects? With each new version, new keywords are introduced, and I'm unsure if it's feasible to replicate their behavior using the previous version. Regarding upgrading, I've developed the DSL, and I believe it's capable of handling all upgrades. Please review the recent changes I made in the repository and please provide feedback if possible." |
@MeastroZI It is not always feasible, but I think you can go a long way with it, and we can think how to handle the problematic cases. I think if the resulting downgraded schema is a superset of the schema (i.e. it doesn't add more constraints), then it's probably acceptable. |
Hello! 👋 This issue has been automatically marked as stale due to inactivity 😴 It will be closed in 180 days if no further activity occurs. To keep it active, please add a comment with more details. There can be many reasons why a specific issue has no activity. The most probable cause is a lack of time, not a lack of interest. Let us figure out together how to push this issue forward. Connect with us through our slack channel : https://json-schema.org/slack Thank you for your patience ❤️ |
This issue did not get any activity in the past 180 days and thus has been closed. Please check if the main branch has fixed it. Please, create a new issue if the issue is not fixed. |
Project title
Define upgrade/downgrade language agnostic declarative transformation rules for all JSON Schema dialects.
Brief Description
The Alterschema project defines a set of JSON-based formal transformation rules for upgrading schemas between Draft 4 and 2020-12, and all dialects in between. These rules are defined using JSON Schema and JSON-e and live within the Alterschema project.
We would like to revise these rules, extend them to support every dialect of JSON Schema (potentially including OpenAPI's old dialects too), and attempt to support some level of downgrading.
Instead of having these rules on the Alterschema repository, we want to have them on the JSON Schema organization for everybody to consume, including Alterschema itself.
Revising the rule format should consider currently unresolved edge cases in Alterschema like tweaking references after a subschema is moved.
Expected Outcomes
A new repository in the JSON Schema organization with upgrade/downgrade rules defined using JSON.
Skills Required
Understanding of various dialects of JSON Schema and their differences.
Mentors
@jviotti
Expected Difficulty
Medium
Expected Time Commitment
350 hours
The text was updated successfully, but these errors were encountered: