-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Getting streaming too fast and too slow for same sample rate during testing #738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
From the looks of your code, it looks like you're reading in all the audio to transcribe first, and only after you're finished are you sending the data to the api. Thus, the API is getting the entire audio chunk at once, instead of in realtime. You'll want to do something more like this: def record_audio(rate, chunk):
buff = queue.Queue()
reccmd = ["arecord", "-f", "S16_LE", "-r", "16000", "-t", "raw"]
p = subprocess.Popen(reccmd, stdout=subprocess.PIPE)
t = threading.Thread(target=lambda: buff.put(bytearray(p.stdout.read(1024))))
t.start()
yield _audio_data_generator(buff)
p.kill()
t.join()
# Signal the _audio_data_generator to finish
buff.put(None) ie spin up a thread that fills the buffer as it comes in. |
@jerjou thanks for the suggestion. We tried out this approach during last could days. We could not get the streaming too fast issue sorted out on NAO robot though. Same code is working fine on my Ubuntu 16.04 without giving this error. The difference between Ubuntu 16.04 and NAO OS (based on Gentoo distribution) is that
Is there any specific issues with python 2.7.3? We changed buffer size from 1024 to 1600 and 3200 with a sleep ranging from 0.01 to 0.1 to see we could get rid of this error. We got the error to subside when buffer = 3200 and sleep = 0.1. How we arrive at this buffer size = 3200 and sleep = 0.1 s is as follows. Audio recording parameters in arecord are, As we have to send samples every 100 ms to match optimum processing from google speech service, 16000 / 10 samples should be sent in every 100 ms. As each sample contains 16 bits, it is (16000 * 16 ) / 10 = 256000 bits = 3200 byes. So we set the buffer to 3200 bytes and read it every 100 ms to send to google speech. The findings are as follows. From above observations, we could get rod of the error with buffer size 3200 and sleep 0.1 but it took 5-9 seconds to get the transcript. Is there any reason for that much of delay? See record_audio() method and ReadAudioThread class.
Full file attached.. |
@jerjou is there any update on this? This is a blocker issue for us in our development. You input is much appreciated. |
I'm able to reproduce the issue by introducing network latency on my test machine. Does this happen when the network connection is reliable as well, or is it always patchy on the Nao? I also notice from the output in your initial comment that In general I'd advise against adding sleeps - the error you're getting indicates the rate you're getting data from your sound card is different from the rate the api is getting it |
Oops - clicked 'Comment' before I was done with the thought. So, the sample was written in a way that, if you sleep, the audio data will continue to buffer, and just send it all at once in the next request. If the rate of the microphone generating data, and the rate that the api expects data, match up, everything should work out fine. Honestly, I'm not sure why you're not still getting "too slow" errors from the API, if I hypothesize that the reason you're getting the delay in transcript is because the audio is being interpreted at a different sample rate than it's being recorded in. For example, have you ever tried playing an audio file at a different sample rate than it's been recorded? It's still interpretable, but it's distorted and sounds weird :-) Anyway, just some guesses. Again, I'd recommend using pyalsaaudio instead of shelling out to |
@jerjou thanks for the reply. NAO does support 16 kHz. What I added to initial comment was one test we did with changing rate to 14 kHz. I accept that network is bit slower/unstable where NAO is tested. I will try to using I've seen that users have to put 100 ms of audio every 100 ms in the streaming channel. If a 100 ms audio packet gets delayed to reach GCP at some point, can it cause an issue? Looks like this is the issue you have recreated by adding network latency. If this is the issue, how can we make sure that every 100 ms packet reaches GCP at a frequency of 100 ms? Isn't this too much to expect from a slow network/bad connection? |
(FYI I agree with you, and am investigating things on the server end - might be a bug on our side. Will update when I find out more) |
I have modified the code to use |
@jerjou do you have any update on server side issues related to this?
|
Turns out I was wrong about the bug I thought I saw. Still investigating.. Instead of installing the python packages globally on your system, I'd recommend installing it in a virtualenv - that way you can be certain you've got all the right versions, without conflicting with any packages already installed on the system. Then you should just be able to |
@jerjou Yes, I have installed the packages inside a virtual environment as read me file suggests. However, it threw that checksum issue. I'm working on it to get it corrected. I just got the same package downloaded in one of my development machines which has opennao vm without an error. Hope to push it to nao from a fresh installation of opennao vm. Is there any update from the back end guys? I checked with google support ([email protected]) on the same and got the following response pointing to this forum.
|
Yeah - they're looking into it; but keep in mind they're juggling other priorities (and it wasn't the obvious bug). I'll update here when I hear more. |
Okay - they pushed a fix. Try again, and let me know if you're still hitting this. |
Thanks a lot for following this up. I will check and confirm. As I have migrated to Australia, it will take sometime to confirm though. Meantime, if anyone else can confirm if this is fixed, it would be really good. |
In which file did you encounter the issue?
transcribe_streaming.py
Did you change the file? If so, how?
Yes, to use arecord instead of pyaudio/portaudio. You can find the modified file (transcribe_streaming_arecord.py) attached to 7th comment of #728
Describe the issue
When the script was run, it throws the following error on regular basis. We are testing this from NAO robot mic and NAOqi OS (distribution based on Gentoo OS).
Mic is identified properly as seen in the following command.
And sound driver supports sampling rates exceeding 48k.
We have also observed that google server has complained about streaming too fast or slow even for same rate (e.g. 16k) at different times. Only difference between these tests was that the network latency kept changing and was high in general. Can the above error be caused by unstable network bandwidth? Is there any solution or workaround to use streaming under a bad network condition. Can any other factors cause this error?
The text was updated successfully, but these errors were encountered: