Skip to content

perf: add Accept-Encoding: gzip to epidata client(s) #1637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dshemetov opened this issue Apr 3, 2025 · 2 comments
Closed

perf: add Accept-Encoding: gzip to epidata client(s) #1637

dshemetov opened this issue Apr 3, 2025 · 2 comments

Comments

@dshemetov
Copy link
Contributor

I was playing around with curl and large requests on our API and ran into the importance of the --compressed flag. The download size difference is enormous (especially for JSON):

# JSON
$ curl -H "Authorization: Bearer API Key" https://api.delphi.cmu.edu/epidata/covidcast/\?data_source\=jhu-csse\&signals\=confirmed_cumulative_num\&geo_type\=county\&time_type\=day\&geo_values\=\*\&time_values\=* > out.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1034M  100 1034M    0     0  8633k      0  0:02:02  0:02:02 --:--:-- 15.2M

$ curl --compressed -H "Authorization: Bearer API Key" https://api.delphi.cmu.edu/epidata/covidcast/\?data_source\=jhu-csse\&signals\=confirmed_cumulative_num\&geo_type\=county\&time_type\=day\&geo_values\=\*\&time_values\=\* > out2.json  
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27.8M    0 27.8M    0     0   229k      0 --:--:--  0:02:04 --:--:--  468k

# CSV
$ curl -H "Authorization: Bearer API Key" https://api.delphi.cmu.edu/epidata/covidcast/\?data_source\=jhu-csse\&signals\=confirmed_cumulative_num\&geo_type\=county\&time_type\=day\&geo_values\=\*\&time_values\=%2A\&format\=csv > out.csv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  314M  100  314M    0     0  2136k      0  0:02:30  0:02:30 --:--:-- 4451k

$ curl --compressed -H "Authorization: Bearer API Key" https://api.delphi.cmu.edu/epidata/covidcast/\?data_source\=jhu-csse\&signals\=confirmed_cumulative_num\&geo_type\=county\&time_type\=day\&geo_values\=\*\&time_values\=\*\&format\=csv > out2.csv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 25.3M    0 25.3M    0     0   167k      0 --:--:--  0:02:34 --:--:--  440k

AFAICT, if Accept-Encoding is blank, then it defaults to identity. We specify it in epidatr, but it doesn't look like we do it in the delphi-epidata clients here.

@melange396
Copy link
Collaborator

This is a great idea, but the python requests package already does this by default:

import requests
r = requests.get("https://api.delphi.cmu.edu/epidata/covidcast?data_source=google-symptoms&signal=s01_raw_search&time_type=day&time_values=20240101&geo_type=state&geo_value=pa")
print(r.request.headers['accept-encoding'])
# => 'gzip, deflate'

Your browser should be doing this too -- open the network tab of the debugging console and load up some data in https://delphi.cmu.edu/epivis/ or https://delphi.cmu.edu/covidcast/ to see it in action.

@dshemetov
Copy link
Contributor Author

Ah phew glad it's standard there. Figured it's worth a double check, given how big of a difference it makes! I tested the httr library the R client uses and it also defaults to that (interestingly, it's libcurl that sets that particular default). And I looked at our dashboards and they also enable it by default. Thanks for looking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants