Skip to content

Commit 1684b04

Browse files
dizcologybusunkim96
authored andcommitted
Speech API: enhanced model and recognition metadata [(#1436)](GoogleCloudPlatform/python-docs-samples#1436)
* enhanced model and recognition metadata * flake, update tests * readme * client library version update
1 parent 86178a2 commit 1684b04

File tree

6 files changed

+202
-21
lines changed

6 files changed

+202
-21
lines changed

packages/google-cloud-python-speech/samples/snippets/README.rst

Lines changed: 48 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ This directory contains samples for Google Cloud Speech API. The `Google Cloud S
1616

1717

1818

19-
.. _Google Cloud Speech API: https://cloud.google.com/speech/docs/
19+
.. _Google Cloud Speech API: https://cloud.google.com/speech/docs/
2020

2121
Setup
2222
-------------------------------------------------------------------------------
@@ -91,22 +91,21 @@ To run this sample:
9191
$ python transcribe.py
9292
9393
usage: transcribe.py [-h] path
94-
94+
9595
Google Cloud Speech API sample application using the REST API for batch
9696
processing.
97-
97+
9898
Example usage:
9999
python transcribe.py resources/audio.raw
100100
python transcribe.py gs://cloud-samples-tests/speech/brooklyn.flac
101-
101+
102102
positional arguments:
103103
path File or GCS path for audio file to be recognized
104-
104+
105105
optional arguments:
106106
-h, --help show this help message and exit
107107
108108
109-
110109
Transcribe async
111110
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
112111

@@ -123,22 +122,21 @@ To run this sample:
123122
$ python transcribe_async.py
124123
125124
usage: transcribe_async.py [-h] path
126-
125+
127126
Google Cloud Speech API sample application using the REST API for async
128127
batch processing.
129-
128+
130129
Example usage:
131130
python transcribe_async.py resources/audio.raw
132131
python transcribe_async.py gs://cloud-samples-tests/speech/vr.flac
133-
132+
134133
positional arguments:
135134
path File or GCS path for audio file to be recognized
136-
135+
137136
optional arguments:
138137
-h, --help show this help message and exit
139138
140139
141-
142140
Transcribe with word time offsets
143141
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
144142

@@ -155,21 +153,20 @@ To run this sample:
155153
$ python transcribe_word_time_offsets.py
156154
157155
usage: transcribe_word_time_offsets.py [-h] path
158-
156+
159157
Google Cloud Speech API sample that demonstrates word time offsets.
160-
158+
161159
Example usage:
162160
python transcribe_word_time_offsets.py resources/audio.raw
163161
python transcribe_word_time_offsets.py gs://cloud-samples-tests/speech/vr.flac
164-
162+
165163
positional arguments:
166164
path File or GCS path for audio file to be recognized
167-
165+
168166
optional arguments:
169167
-h, --help show this help message and exit
170168
171169
172-
173170
Transcribe Streaming
174171
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
175172

@@ -186,19 +183,50 @@ To run this sample:
186183
$ python transcribe_streaming.py
187184
188185
usage: transcribe_streaming.py [-h] stream
189-
186+
190187
Google Cloud Speech API sample application using the streaming API.
191-
188+
192189
Example usage:
193190
python transcribe_streaming.py resources/audio.raw
194-
191+
195192
positional arguments:
196193
stream File to stream to the API
197-
194+
198195
optional arguments:
199196
-h, --help show this help message and exit
200197
201198
199+
Beta Samples
200+
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
201+
202+
.. image:: https://gstatic.com/cloudssh/images/open-btn.png
203+
:target: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=speech/cloud-client/beta_snippets.py;speech/cloud-client/README.rst
204+
205+
206+
207+
208+
To run this sample:
209+
210+
.. code-block:: bash
211+
212+
$ python beta_snippets.py
213+
214+
usage: beta_snippets.py [-h] command path
215+
216+
Google Cloud Speech API sample that demonstrates enhanced models
217+
and recognition metadata.
218+
219+
Example usage:
220+
python beta_snippets.py enhanced-model resources/commercial_mono.wav
221+
python beta_snippets.py metadata resources/commercial_mono.wav
222+
223+
positional arguments:
224+
command
225+
path File for audio file to be recognized
226+
227+
optional arguments:
228+
-h, --help show this help message and exit
229+
202230
203231
204232

packages/google-cloud-python-speech/samples/snippets/README.rst.in

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,9 @@ samples:
3434
- name: Transcribe Streaming
3535
file: transcribe_streaming.py
3636
show_help: true
37+
- name: Beta Samples
38+
file: beta_snippets.py
39+
show_help: true
3740

3841
cloud_client_library: true
3942

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
#!/usr/bin/env python
2+
3+
# Copyright 2018 Google Inc. All Rights Reserved.
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
"""Google Cloud Speech API sample that demonstrates enhanced models
18+
and recognition metadata.
19+
20+
Example usage:
21+
python beta_snippets.py enhanced-model resources/commercial_mono.wav
22+
python beta_snippets.py metadata resources/commercial_mono.wav
23+
"""
24+
25+
import argparse
26+
import io
27+
28+
from google.cloud import speech_v1p1beta1 as speech
29+
30+
31+
# [START speech_transcribe_file_with_enhanced_model]
32+
def transcribe_file_with_enhanced_model(path):
33+
"""Transcribe the given audio file using an enhanced model."""
34+
client = speech.SpeechClient()
35+
36+
with io.open(path, 'rb') as audio_file:
37+
content = audio_file.read()
38+
39+
audio = speech.types.RecognitionAudio(content=content)
40+
config = speech.types.RecognitionConfig(
41+
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
42+
sample_rate_hertz=8000,
43+
language_code='en-US',
44+
# Enhanced models are only available to projects that
45+
# opt in for audio data collection.
46+
use_enhanced=True,
47+
# A model must be specified to use enhanced model.
48+
model='phone_call')
49+
50+
response = client.recognize(config, audio)
51+
52+
for i, result in enumerate(response.results):
53+
alternative = result.alternatives[0]
54+
print('-' * 20)
55+
print('First alternative of result {}'.format(i))
56+
print('Transcript: {}'.format(alternative.transcript))
57+
# [END speech_transcribe_file_with_enhanced_model]
58+
59+
60+
# [START speech_transcribe_file_with_metadata]
61+
def transcribe_file_with_metadata(path):
62+
"""Send a request that includes recognition metadata."""
63+
client = speech.SpeechClient()
64+
65+
with io.open(path, 'rb') as audio_file:
66+
content = audio_file.read()
67+
68+
# Here we construct a recognition metadata object.
69+
# Most metadata fields are specified as enums that can be found
70+
# in speech.enums.RecognitionMetadata
71+
metadata = speech.types.RecognitionMetadata()
72+
metadata.interaction_type = (
73+
speech.enums.RecognitionMetadata.InteractionType.DISCUSSION)
74+
metadata.microphone_distance = (
75+
speech.enums.RecognitionMetadata.MicrophoneDistance.NEARFIELD)
76+
metadata.recording_device_type = (
77+
speech.enums.RecognitionMetadata.RecordingDeviceType.SMARTPHONE)
78+
# Some metadata fields are free form strings
79+
metadata.recording_device_name = "Pixel 2 XL"
80+
# And some are integers, for instance the 6 digit NAICS code
81+
# https://www.naics.com/search/
82+
metadata.industry_naics_code_of_audio = 519190
83+
84+
audio = speech.types.RecognitionAudio(content=content)
85+
config = speech.types.RecognitionConfig(
86+
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
87+
sample_rate_hertz=8000,
88+
language_code='en-US',
89+
# Add this in the request to send metadata.
90+
metadata=metadata)
91+
92+
response = client.recognize(config, audio)
93+
94+
for i, result in enumerate(response.results):
95+
alternative = result.alternatives[0]
96+
print('-' * 20)
97+
print('First alternative of result {}'.format(i))
98+
print('Transcript: {}'.format(alternative.transcript))
99+
# [END speech_transcribe_file_with_metadata]
100+
101+
102+
if __name__ == '__main__':
103+
parser = argparse.ArgumentParser(
104+
description=__doc__,
105+
formatter_class=argparse.RawDescriptionHelpFormatter)
106+
parser.add_argument('command')
107+
parser.add_argument(
108+
'path', help='File for audio file to be recognized')
109+
110+
args = parser.parse_args()
111+
112+
if args.command == 'enhanced-model':
113+
transcribe_file_with_enhanced_model(args.path)
114+
elif args.command == 'metadata':
115+
transcribe_file_with_metadata(args.path)
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Copyright 2018, Google, Inc.
2+
# Licensed under the Apache License, Version 2.0 (the "License");
3+
# you may not use this file except in compliance with the License.
4+
# You may obtain a copy of the License at
5+
#
6+
# http://www.apache.org/licenses/LICENSE-2.0
7+
#
8+
# Unless required by applicable law or agreed to in writing, software
9+
# distributed under the License is distributed on an "AS IS" BASIS,
10+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
# See the License for the specific language governing permissions and
12+
# limitations under the License.
13+
14+
import os
15+
16+
from beta_snippets import (
17+
transcribe_file_with_enhanced_model, transcribe_file_with_metadata)
18+
19+
RESOURCES = os.path.join(os.path.dirname(__file__), 'resources')
20+
21+
22+
def test_transcribe_file_with_enhanced_model(capsys):
23+
transcribe_file_with_enhanced_model(
24+
os.path.join(RESOURCES, 'commercial_mono.wav'))
25+
out, _ = capsys.readouterr()
26+
27+
assert 'Chrome' in out
28+
29+
30+
def test_transcribe_file_with_metadata(capsys):
31+
transcribe_file_with_metadata(
32+
os.path.join(RESOURCES, 'commercial_mono.wav'))
33+
out, _ = capsys.readouterr()
34+
35+
assert 'Chrome' in out
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
google-cloud-speech==0.32.1
1+
google-cloud-speech==0.33.0

0 commit comments

Comments
 (0)