Skip to content

Commit 79a540b

Browse files
authored
Introducing repetition_delimiter to EDI schema. (#215)
Issue: #212 `repetition_delimiter`: delimiter to separate multiple data instances for an element. For example, if `^` is the repetition delimiter for a segment `DMG*D8*19690815*M**A^B^C^D~`, then the last element has 4 pieces of data: `A`, `B`, `C`, and `D`. Any element without `repetition_delimiter` present has essentially one piece of data; similarly, if `^` is the repetition delimiter for a segment `CLM*A37YH556*500***11:B:1^12:B:2~`, the last element has 2 pieces of data: `11:B:1` and `12:B:2`, each of which is further delimited by a `component_delimiter` `:`. Note, since `repetition_delimiter` creates multiple pieces of data under the same element name in the schema, in most cases the suitable construct type in `transform_declarations` is `array`. Currently we read in all the elements and their components in serial in `NonValidatingReader` into a slice: `[]RawSegElem`, each of which contains the element value, the element index, and component index if there are more than 1 component. When `repetition_delimiter` is added, we continue down the same pattern: `NonValidatingReader` still reads everything into the slice, except now, there potentially can be multiple `RawSegElem` share the same `ElemIndex` and `CompIndex`. Using the example above: `^` is the rep delim and seg is `CLM*A37YH556*500***11:B:1^12:B:2~`. After `NonValidatingReader.Read()` is done, we'll have the following `[]RawSegElem` (simplified): ``` { {'CLM', ElemIndex: 0, CompIndex: 1}, {'A37YH556', ElemIndex: 1, CompIndex: 1}, {'500', ElemIndex: 2, CompIndex: 1}, {'', ElemIndex: 3, CompIndex: 1}, {'', ElemIndex: 4, CompIndex: 1}, {'', ElemIndex: 4, CompIndex: 1}, {'11', ElemIndex: 5, CompIndex: 1}, {'B', ElemIndex: 5, CompIndex: 2}, {'1', ElemIndex: 5, CompIndex: 3}, {'12', ElemIndex: 5, CompIndex: 1}, {'B', ElemIndex: 5, CompIndex: 2}, {'2', ElemIndex: 5, CompIndex: 3}, } ``` Note the last 3 elements have the same `ElemIndex` and `CompIndex` as the previous 3 elements. This behavior is new and introduced in this PR. Now on the EDI reader side (reader.go), previously when we match element decl against the raw element slice, we only do one way scan, because `ElemIndex` and `CompIndex` are always increase, thus we never need to back-scan. With introduction of potentially duplicate `ElemIndex` and `CompIndex`, now for each of the element decl, we simply do a full `[]RawSegElem` scan. Yes, it is a bit more expensive but given usually the number of total elements and components in a seg is really really small (around 20), we feel this trade-off is acceptable without making the already-complex code even more so. With this reader change, the IDR produced will potentially contain child element nodes with the same element name. Thus in schema writing, it's practically required that the user of the `repetition_delimiter` feature needs to use `array` type in the `transform_declarations`.
1 parent 9e0c8da commit 79a540b

21 files changed

+7025
-105
lines changed

doc/edi_in_depth.md

+11-1
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ A full EDI schema `file_declaration` is as follows:
8787
"segment_delimiter": "<segment delimiter>", <== required
8888
"element_delimiter": "<element delimiter>", <== required
8989
"component_delimiter": "<component delimiter>", <== optional
90+
"repetition_delimiter": "<repetition delimiter>", <== optional
9091
"release_character": "<release character>", <== optional
9192
"ignore_crlf": true/false, <== optional
9293
"segment_declarations": [
@@ -126,6 +127,15 @@ standards call for a single ASCII character as `element_delimiter`, omniparser a
126127
`component_delimiter` in omniparser allows UTF-8 string. This is optional, and if not specified, you
127128
can treat each element as of a single component.
128129

130+
- `repetition_delimiter`: delimiter to separate multiple data instances for an element. For example,
131+
if `^` is the repetition delimiter for a segment `DMG*D8*19690815*M**A^B^C^D~`, then the last
132+
element has 4 pieces of data: `A`, `B`, `C`, and `D`. Any element without `repetition_delimiter`
133+
present has essentially one piece of data; similarly, if `^` is the repetition delimiter for a
134+
segment `CLM*A37YH556*500***11:B:1^12:B:2~`, the last element has 2 pieces of data: `11:B:1` and
135+
`12:B:2`, each of which is further delimited by a `component_delimiter` `:`. Note, since
136+
`repetition_delimiter` creates multiple pieces of data under the same element name in the schema,
137+
in most cases the suitable construct type in `transform_declarations` is `array`.
138+
129139
- `release_character`: an optional escape character for delimiters. Imagine a piece of element data
130140
contains a `*` which happens to be `element_delimiter`. Without escaping, parser would treat that `*`
131141
as a real delimiter. Any character preceded by `release_character` will be treated literally.
@@ -550,7 +560,7 @@ And we can add the transform reference into the `FINAL_OUTPUT` directly:
550560
```
551561
Run cli we have:
552562
```
553-
$ cli.sh transform -i 2_ups_edi_210.input.txt -s test.schema.json
563+
$ cli.sh transform -i 2_ups_edi_210.input.txt -s test.schema.json
554564
[
555565
{
556566
"invoice_number": "0000001808WW308"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
{
2+
"Records": [
3+
{
4+
"Children": [
5+
{
6+
"Children": [
7+
{
8+
"Children": null,
9+
"Data": "D8",
10+
"FirstChild": null,
11+
"FormatSpecific": null,
12+
"LastChild": null,
13+
"NextSibling": null,
14+
"Parent": "(ElementNode d8)",
15+
"PrevSibling": null,
16+
"Type": "TextNode"
17+
}
18+
],
19+
"Data": "d8",
20+
"FirstChild": "(TextNode 'D8')",
21+
"FormatSpecific": null,
22+
"LastChild": "(TextNode 'D8')",
23+
"NextSibling": "(ElementNode d_date)",
24+
"Parent": "(ElementNode DMG)",
25+
"PrevSibling": null,
26+
"Type": "ElementNode"
27+
},
28+
{
29+
"Children": [
30+
{
31+
"Children": null,
32+
"Data": "19910512",
33+
"FirstChild": null,
34+
"FormatSpecific": null,
35+
"LastChild": null,
36+
"NextSibling": null,
37+
"Parent": "(ElementNode d_date)",
38+
"PrevSibling": null,
39+
"Type": "TextNode"
40+
}
41+
],
42+
"Data": "d_date",
43+
"FirstChild": "(TextNode '19910512')",
44+
"FormatSpecific": null,
45+
"LastChild": "(TextNode '19910512')",
46+
"NextSibling": "(ElementNode d_cat)",
47+
"Parent": "(ElementNode DMG)",
48+
"PrevSibling": "(ElementNode d8)",
49+
"Type": "ElementNode"
50+
},
51+
{
52+
"Children": [
53+
{
54+
"Children": null,
55+
"Data": "RET",
56+
"FirstChild": null,
57+
"FormatSpecific": null,
58+
"LastChild": null,
59+
"NextSibling": null,
60+
"Parent": "(ElementNode d_cat)",
61+
"PrevSibling": null,
62+
"Type": "TextNode"
63+
}
64+
],
65+
"Data": "d_cat",
66+
"FirstChild": "(TextNode 'RET')",
67+
"FormatSpecific": null,
68+
"LastChild": "(TextNode 'RET')",
69+
"NextSibling": "(ElementNode d_cat)",
70+
"Parent": "(ElementNode DMG)",
71+
"PrevSibling": "(ElementNode d_date)",
72+
"Type": "ElementNode"
73+
},
74+
{
75+
"Children": [
76+
{
77+
"Children": null,
78+
"Data": "RET",
79+
"FirstChild": null,
80+
"FormatSpecific": null,
81+
"LastChild": null,
82+
"NextSibling": null,
83+
"Parent": "(ElementNode d_cat)",
84+
"PrevSibling": null,
85+
"Type": "TextNode"
86+
}
87+
],
88+
"Data": "d_cat",
89+
"FirstChild": "(TextNode 'RET')",
90+
"FormatSpecific": null,
91+
"LastChild": "(TextNode 'RET')",
92+
"NextSibling": "(ElementNode d_code)",
93+
"Parent": "(ElementNode DMG)",
94+
"PrevSibling": "(ElementNode d_cat)",
95+
"Type": "ElementNode"
96+
},
97+
{
98+
"Children": [
99+
{
100+
"Children": null,
101+
"Data": "2135-2",
102+
"FirstChild": null,
103+
"FormatSpecific": null,
104+
"LastChild": null,
105+
"NextSibling": null,
106+
"Parent": "(ElementNode d_code)",
107+
"PrevSibling": null,
108+
"Type": "TextNode"
109+
}
110+
],
111+
"Data": "d_code",
112+
"FirstChild": "(TextNode '2135-2')",
113+
"FormatSpecific": null,
114+
"LastChild": "(TextNode '2135-2')",
115+
"NextSibling": "(ElementNode d_code)",
116+
"Parent": "(ElementNode DMG)",
117+
"PrevSibling": "(ElementNode d_cat)",
118+
"Type": "ElementNode"
119+
},
120+
{
121+
"Children": [
122+
{
123+
"Children": null,
124+
"Data": "2106-3",
125+
"FirstChild": null,
126+
"FormatSpecific": null,
127+
"LastChild": null,
128+
"NextSibling": null,
129+
"Parent": "(ElementNode d_code)",
130+
"PrevSibling": null,
131+
"Type": "TextNode"
132+
}
133+
],
134+
"Data": "d_code",
135+
"FirstChild": "(TextNode '2106-3')",
136+
"FormatSpecific": null,
137+
"LastChild": "(TextNode '2106-3')",
138+
"NextSibling": null,
139+
"Parent": "(ElementNode DMG)",
140+
"PrevSibling": "(ElementNode d_code)",
141+
"Type": "ElementNode"
142+
}
143+
],
144+
"Data": "DMG",
145+
"FirstChild": "(ElementNode d8)",
146+
"FormatSpecific": null,
147+
"LastChild": "(ElementNode d_code)",
148+
"NextSibling": null,
149+
"Parent": "(DocumentNode)",
150+
"PrevSibling": null,
151+
"Type": "ElementNode"
152+
}
153+
],
154+
"FinalErr": "EOF"
155+
}
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,167 @@
11
{
22
"Records": [
3-
"{'e1':'0','e2':'1','e3':'2'}",
4-
"{'e1':'3','e2':'4','e3':'5'}"
3+
{
4+
"Children": [
5+
{
6+
"Children": [
7+
{
8+
"Children": null,
9+
"Data": "0",
10+
"FirstChild": null,
11+
"FormatSpecific": null,
12+
"LastChild": null,
13+
"NextSibling": null,
14+
"Parent": "(ElementNode e1)",
15+
"PrevSibling": null,
16+
"Type": "TextNode"
17+
}
18+
],
19+
"Data": "e1",
20+
"FirstChild": "(TextNode '0')",
21+
"FormatSpecific": null,
22+
"LastChild": "(TextNode '0')",
23+
"NextSibling": "(ElementNode e2)",
24+
"Parent": "(ElementNode ISA)",
25+
"PrevSibling": null,
26+
"Type": "ElementNode"
27+
},
28+
{
29+
"Children": [
30+
{
31+
"Children": null,
32+
"Data": "1",
33+
"FirstChild": null,
34+
"FormatSpecific": null,
35+
"LastChild": null,
36+
"NextSibling": null,
37+
"Parent": "(ElementNode e2)",
38+
"PrevSibling": null,
39+
"Type": "TextNode"
40+
}
41+
],
42+
"Data": "e2",
43+
"FirstChild": "(TextNode '1')",
44+
"FormatSpecific": null,
45+
"LastChild": "(TextNode '1')",
46+
"NextSibling": "(ElementNode e3)",
47+
"Parent": "(ElementNode ISA)",
48+
"PrevSibling": "(ElementNode e1)",
49+
"Type": "ElementNode"
50+
},
51+
{
52+
"Children": [
53+
{
54+
"Children": null,
55+
"Data": "2",
56+
"FirstChild": null,
57+
"FormatSpecific": null,
58+
"LastChild": null,
59+
"NextSibling": null,
60+
"Parent": "(ElementNode e3)",
61+
"PrevSibling": null,
62+
"Type": "TextNode"
63+
}
64+
],
65+
"Data": "e3",
66+
"FirstChild": "(TextNode '2')",
67+
"FormatSpecific": null,
68+
"LastChild": "(TextNode '2')",
69+
"NextSibling": null,
70+
"Parent": "(ElementNode ISA)",
71+
"PrevSibling": "(ElementNode e2)",
72+
"Type": "ElementNode"
73+
}
74+
],
75+
"Data": "ISA",
76+
"FirstChild": "(ElementNode e1)",
77+
"FormatSpecific": null,
78+
"LastChild": "(ElementNode e3)",
79+
"NextSibling": null,
80+
"Parent": "(DocumentNode)",
81+
"PrevSibling": null,
82+
"Type": "ElementNode"
83+
},
84+
{
85+
"Children": [
86+
{
87+
"Children": [
88+
{
89+
"Children": null,
90+
"Data": "3",
91+
"FirstChild": null,
92+
"FormatSpecific": null,
93+
"LastChild": null,
94+
"NextSibling": null,
95+
"Parent": "(ElementNode e1)",
96+
"PrevSibling": null,
97+
"Type": "TextNode"
98+
}
99+
],
100+
"Data": "e1",
101+
"FirstChild": "(TextNode '3')",
102+
"FormatSpecific": null,
103+
"LastChild": "(TextNode '3')",
104+
"NextSibling": "(ElementNode e2)",
105+
"Parent": "(ElementNode ISA)",
106+
"PrevSibling": null,
107+
"Type": "ElementNode"
108+
},
109+
{
110+
"Children": [
111+
{
112+
"Children": null,
113+
"Data": "4",
114+
"FirstChild": null,
115+
"FormatSpecific": null,
116+
"LastChild": null,
117+
"NextSibling": null,
118+
"Parent": "(ElementNode e2)",
119+
"PrevSibling": null,
120+
"Type": "TextNode"
121+
}
122+
],
123+
"Data": "e2",
124+
"FirstChild": "(TextNode '4')",
125+
"FormatSpecific": null,
126+
"LastChild": "(TextNode '4')",
127+
"NextSibling": "(ElementNode e3)",
128+
"Parent": "(ElementNode ISA)",
129+
"PrevSibling": "(ElementNode e1)",
130+
"Type": "ElementNode"
131+
},
132+
{
133+
"Children": [
134+
{
135+
"Children": null,
136+
"Data": "5",
137+
"FirstChild": null,
138+
"FormatSpecific": null,
139+
"LastChild": null,
140+
"NextSibling": null,
141+
"Parent": "(ElementNode e3)",
142+
"PrevSibling": null,
143+
"Type": "TextNode"
144+
}
145+
],
146+
"Data": "e3",
147+
"FirstChild": "(TextNode '5')",
148+
"FormatSpecific": null,
149+
"LastChild": "(TextNode '5')",
150+
"NextSibling": null,
151+
"Parent": "(ElementNode ISA)",
152+
"PrevSibling": "(ElementNode e2)",
153+
"Type": "ElementNode"
154+
}
155+
],
156+
"Data": "ISA",
157+
"FirstChild": "(ElementNode e1)",
158+
"FormatSpecific": null,
159+
"LastChild": "(ElementNode e3)",
160+
"NextSibling": null,
161+
"Parent": "(DocumentNode)",
162+
"PrevSibling": null,
163+
"Type": "ElementNode"
164+
}
5165
],
6166
"FinalErr": "EOF"
7167
}

0 commit comments

Comments
 (0)