Skip to content

Commit 3eca636

Browse files
brokentf-text-github-robot
authored andcommitted
(Generated change) Update tf.Text versions and/or docs.
PiperOrigin-RevId: 519245493
1 parent 934ba32 commit 3eca636

22 files changed

+813
-29
lines changed

WORKSPACE

+3-3
Original file line numberDiff line numberDiff line change
@@ -70,10 +70,10 @@ http_archive(
7070

7171
http_archive(
7272
name = "org_tensorflow",
73-
strip_prefix = "tensorflow-2.11.0",
74-
sha256 = "e52cda3bae45f0ae0fccd4055e9fa29892b414f70e2df94df9a3a10319c75fff",
73+
strip_prefix = "tensorflow-2.12.0",
74+
sha256 = "af0584df1a4e28763c32c218b39f8c4f3784fabb6a8859b00c02d743864dc191",
7575
urls = [
76-
"https://github.com/tensorflow/tensorflow/archive/v2.11.0.zip"
76+
"https://github.com/tensorflow/tensorflow/archive/v2.12.0.zip"
7777
],
7878
)
7979

docs/api_docs/python/text.md

+13-1
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,9 @@ Hub module.
5858
[`class HubModuleTokenizer`](./text/HubModuleTokenizer.md): Tokenizer that uses
5959
a Hub module.
6060

61+
[`class LastNItemSelector`](./text/LastNItemSelector.md): An `ItemSelector` that
62+
selects the last `n` items in the batch.
63+
6164
[`class MaskValuesChooser`](./text/MaskValuesChooser.md): Assigns values to the
6265
items chosen for masking.
6366

@@ -126,6 +129,9 @@ of UTF-8 string tokens into subword pieces.
126129

127130
## Functions
128131

132+
[`boise_tags_to_offsets(...)`](./text/boise_tags_to_offsets.md): Converts the
133+
token offsets and BOISE tags into span offsets and span type.
134+
129135
[`build_fast_bert_normalizer_model(...)`](./text/build_fast_bert_normalizer_model.md):
130136
build_fast_bert_normalizer_model(arg0: bool) -> bytes
131137

@@ -141,6 +147,9 @@ UTF-8 string in the input.
141147
[`combine_segments(...)`](./text/combine_segments.md): Combine one or more input
142148
segments for a model's input sequence.
143149

150+
[`concatenate_segments(...)`](./text/concatenate_segments.md): Concatenate input
151+
segments for a model's input sequence.
152+
144153
[`find_source_offsets(...)`](./text/find_source_offsets.md): Maps the input
145154
post-normalized string offsets to pre-normalized offsets.
146155

@@ -188,6 +197,9 @@ fragments in a given text. (deprecated)
188197

189198
[`span_overlaps(...)`](./text/span_overlaps.md): Returns a boolean tensor indicating which source and target spans overlap.
190199

200+
[`utf8_binarize(...)`](./text/utf8_binarize.md): Decode UTF8 tokens into code
201+
points and return their bits.
202+
191203
[`viterbi_constrained_sequence(...)`](./text/viterbi_constrained_sequence.md): Performs greedy constrained sequence on a batch of examples.
192204

193205
[`wordshape(...)`](./text/wordshape.md): Determine wordshape features for each input string.
@@ -202,7 +214,7 @@ fragments in a given text. (deprecated)
202214
**version**<a id="__version__"></a>
203215
</td>
204216
<td>
205-
`'2.11.0'`
217+
`'2.12.0'`
206218
</td>
207219
</tr>
208220
</table>

docs/api_docs/python/text/ByteSplitter.md

+73-1
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ description: Splits a string tensor into bytes.
55
<meta itemprop="path" content="Stable" />
66
<meta itemprop="property" content="__init__"/>
77
<meta itemprop="property" content="split"/>
8+
<meta itemprop="property" content="split_by_offsets"/>
89
<meta itemprop="property" content="split_with_offsets"/>
910
</div>
1011

@@ -85,6 +86,76 @@ each string.
8586

8687
</table>
8788

89+
<h3 id="split_by_offsets"><code>split_by_offsets</code></h3>
90+
91+
<a target="_blank" class="external" href="https://github.com/tensorflow/text/tree/master/tensorflow_text/python/ops/byte_splitter.py">View
92+
source</a>
93+
94+
<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
95+
<code>split_by_offsets(
96+
input, start_offsets, end_offsets
97+
)
98+
</code></pre>
99+
100+
Splits a string tensor into sub-strings.
101+
102+
The strings are split based upon the provided byte offsets.
103+
104+
#### Example:
105+
106+
```
107+
>>> splitter = ByteSplitter()
108+
>>> substrings = splitter.split_by_offsets("hello", [0, 4], [4, 5])
109+
>>> print(substrings.numpy())
110+
[b'hell' b'o']
111+
```
112+
113+
<!-- Tabular view -->
114+
115+
<table class="responsive fixed orange">
116+
<colgroup><col width="214px"><col></colgroup>
117+
<tr><th colspan="2">Args</th></tr>
118+
119+
<tr>
120+
<td>
121+
`input`
122+
</td>
123+
<td>
124+
`Tensor` or `RaggedTensor` of strings of any shape to split.
125+
</td>
126+
</tr><tr>
127+
<td>
128+
`start_offsets`
129+
</td>
130+
<td>
131+
`Tensor` or `RaggedTensor` of byte offsets to start splits
132+
on (inclusive). This should be one more than the rank of `input`.
133+
</td>
134+
</tr><tr>
135+
<td>
136+
`end_offsets`
137+
</td>
138+
<td>
139+
`Tensor` or `RaggedTensor` of byte offsets to end splits
140+
on (exclusive). This should be one more than the rank of `input`.
141+
</td>
142+
</tr>
143+
</table>
144+
145+
<!-- Tabular view -->
146+
147+
<table class="responsive fixed orange">
148+
<colgroup><col width="214px"><col></colgroup>
149+
<tr><th colspan="2">Returns</th></tr>
150+
<tr class="alt">
151+
<td colspan="2">
152+
A `RaggedTensor` or `Tensor` of substrings. The returned shape is the
153+
shape of the offsets.
154+
</td>
155+
</tr>
156+
157+
</table>
158+
88159
<h3 id="split_with_offsets"><code>split_with_offsets</code></h3>
89160

90161
<a target="_blank" class="external" href="https://github.com/tensorflow/text/tree/master/tensorflow_text/python/ops/byte_splitter.py">View
@@ -126,12 +197,13 @@ A `RaggedTensor` or `Tensor` of strings with any shape.
126197
</table>
127198

128199
<!-- Tabular view -->
200+
129201
<table class="responsive fixed orange">
130202
<colgroup><col width="214px"><col></colgroup>
131203
<tr><th colspan="2">Returns</th></tr>
132204
<tr class="alt">
133205
<td colspan="2">
134-
A `RaggedTensor` of bytest. The returned shape is the shape of the
206+
A `RaggedTensor` of bytes. The returned shape is the shape of the
135207
input tensor with an added ragged dimension for the bytes that make up
136208
each string.
137209
</td>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
description: An ItemSelector that selects the last n items in the batch.
2+
3+
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
4+
<meta itemprop="name" content="text.LastNItemSelector" />
5+
<meta itemprop="path" content="Stable" />
6+
<meta itemprop="property" content="__init__"/>
7+
<meta itemprop="property" content="get_selectable"/>
8+
<meta itemprop="property" content="get_selection_mask"/>
9+
</div>
10+
11+
# text.LastNItemSelector
12+
13+
<!-- Insert buttons and diff -->
14+
15+
<table class="tfo-notebook-buttons tfo-api nocontent" align="left">
16+
17+
</table>
18+
19+
<a target="_blank" class="external" href="https://github.com/tensorflow/text/tree/master/tensorflow_text/python/ops/item_selector_ops.py">View
20+
source</a>
21+
22+
An `ItemSelector` that selects the last `n` items in the batch.
23+
24+
<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
25+
<code>text.LastNItemSelector(
26+
num_to_select, unselectable_ids=None
27+
)
28+
</code></pre>
29+
30+
<!-- Placeholder for "Used in" -->
31+
<!-- Tabular view -->
32+
33+
<table class="responsive fixed orange">
34+
<colgroup><col width="214px"><col></colgroup>
35+
<tr><th colspan="2"><h2 class="add-link">Args</h2></th></tr>
36+
37+
<tr>
38+
<td>
39+
`num_to_select`<a id="num_to_select"></a>
40+
</td>
41+
<td>
42+
An int which is the leading number of items to select.
43+
</td>
44+
</tr><tr>
45+
<td>
46+
`unselectable_ids`<a id="unselectable_ids"></a>
47+
</td>
48+
<td>
49+
(optional) A list of int ids that cannot be selected.
50+
Default is empty list.
51+
</td>
52+
</tr>
53+
</table>
54+
55+
<!-- Tabular view -->
56+
57+
<table class="responsive fixed orange">
58+
<colgroup><col width="214px"><col></colgroup>
59+
<tr><th colspan="2"><h2 class="add-link">Attributes</h2></th></tr>
60+
61+
<tr> <td> `unselectable_ids`<a id="unselectable_ids"></a> </td> <td>
62+
63+
</td>
64+
</tr>
65+
</table>
66+
67+
## Methods
68+
69+
<h3 id="get_selectable"><code>get_selectable</code></h3>
70+
71+
<a target="_blank" class="external" href="https://github.com/tensorflow/text/tree/master/tensorflow_text/python/ops/item_selector_ops.py">View
72+
source</a>
73+
74+
<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
75+
<code>get_selectable(
76+
input_ids, axis
77+
)
78+
</code></pre>
79+
80+
See `get_selectable()` in superclass.
81+
82+
<h3 id="get_selection_mask"><code>get_selection_mask</code></h3>
83+
84+
<a target="_blank" class="external" href="https://github.com/tensorflow/text/tree/master/tensorflow_text/python/ops/item_selector_ops.py">View
85+
source</a>
86+
87+
<pre class="devsite-click-to-copy prettyprint lang-py tfo-signature-link">
88+
<code>get_selection_mask(
89+
input_ids, axis=1
90+
)
91+
</code></pre>
92+
93+
Returns a mask of items that have been selected.
94+
95+
The default implementation simply returns all items not excluded by
96+
`get_selectable`.
97+
98+
<!-- Tabular view -->
99+
100+
<table class="responsive fixed orange">
101+
<colgroup><col width="214px"><col></colgroup>
102+
<tr><th colspan="2">Args</th></tr>
103+
104+
<tr>
105+
<td>
106+
`input_ids`
107+
</td>
108+
<td>
109+
A `RaggedTensor`.
110+
</td>
111+
</tr><tr>
112+
<td>
113+
`axis`
114+
</td>
115+
<td>
116+
(optional) An int detailing the dimension to apply selection on.
117+
Default is the 1st dimension.
118+
</td>
119+
</tr>
120+
</table>
121+
122+
<!-- Tabular view -->
123+
124+
<table class="responsive fixed orange">
125+
<colgroup><col width="214px"><col></colgroup>
126+
<tr><th colspan="2">Returns</th></tr>
127+
<tr class="alt">
128+
<td colspan="2">
129+
a `RaggedTensor` with shape `input_ids.shape[:axis]`. Its values are True
130+
if the corresponding item (or broadcasted subitems) should be selected.
131+
</td>
132+
</tr>
133+
134+
</table>

docs/api_docs/python/text/RoundRobinTrimmer.md

+10-8
Original file line numberDiff line numberDiff line change
@@ -92,19 +92,20 @@ truncate budget will be allocated as [2, 2, 1].
9292
`segments`
9393
</td>
9494
<td>
95-
A list of `RaggedTensor` each w/ a shape of [num_batch,
95+
A list of `RaggedTensor`s each with a shape of [num_batch,
9696
(num_items)].
9797
</td>
9898
</tr>
9999
</table>
100100

101101
<!-- Tabular view -->
102+
102103
<table class="responsive fixed orange">
103104
<colgroup><col width="214px"><col></colgroup>
104105
<tr><th colspan="2">Returns</th></tr>
105106
<tr class="alt">
106107
<td colspan="2">
107-
a list with len(segments) of `RaggedTensor`s, see superclass for details.
108+
A list with len(segments) of `RaggedTensor`s, see superclass for details.
108109
</td>
109110
</tr>
110111

@@ -123,8 +124,11 @@ source</a>
123124

124125
Truncate the list of `segments`.
125126

126-
Truncate the list of `segments` using the truncation strategy defined by
127-
`generate_mask`.
127+
Truncate the list of `segments` using the 'round-robin' strategy which allocates
128+
quota in each bucket, left-to-right repeatedly until all buckets are filled.
129+
130+
For example if the budget of [5] and we have segments of size [3, 4, 2], the
131+
truncate budget will be allocated as [2, 2, 1].
128132

129133
<!-- Tabular view -->
130134
<table class="responsive fixed orange">
@@ -142,15 +146,13 @@ A list of `RaggedTensor`s w/ shape [num_batch, (num_items)].
142146
</table>
143147

144148
<!-- Tabular view -->
149+
145150
<table class="responsive fixed orange">
146151
<colgroup><col width="214px"><col></colgroup>
147152
<tr><th colspan="2">Returns</th></tr>
148153
<tr class="alt">
149154
<td colspan="2">
150-
a list of `RaggedTensor`s with len(segments) number of items and where
151-
each item has the same shape as its counterpart in `segments` and
152-
with unwanted values dropped. The values are dropped according to the
153-
`TruncationStrategy` defined.
155+
A list with len(segments) of `RaggedTensor`s, see superclass for details.
154156
</td>
155157
</tr>
156158

0 commit comments

Comments
 (0)