Skip to content

Switch to GET on 301 redirect too #989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
slandelle opened this issue Oct 6, 2015 · 0 comments
Closed

Switch to GET on 301 redirect too #989

slandelle opened this issue Oct 6, 2015 · 0 comments
Assignees
Labels
Milestone

Comments

@slandelle
Copy link
Contributor

Wrong as per RFC2616, but that's what browsers do...

@slandelle slandelle self-assigned this Oct 6, 2015
@slandelle slandelle added this to the 2.0.0 milestone Oct 6, 2015
slandelle pushed a commit that referenced this issue Jun 9, 2017
* Update Head302Test to show it is actually failing

This test had the following defects which caused it to falsely pass:

* Jetty requires a handler to explicitly indicate it has handled the
request by calling Request.setHandled(true).  Otherwise the server will
return 404.
* The test did not configure the client to follow redirects.
* The test itself wasn't asserting anything useful beyond that the
request did not time out.  To ensure the redirect was followed, it needs
to assert the expected response status code from the redirected location
is received.

* Fix broken Head302Test

The test now verifies that a HEAD redirected via 302 is switched to GET
per [1].

[1] #989
ssoloff added a commit to ssoloff/triplea-game-triplea that referenced this issue Jun 10, 2017
It was discovered that AsyncHttpClient converts a HEAD request to a GET
after a redirect [1].  Although this behavior is incorrect per RFC2616,
the author made a conscious decision to implement it this way to match
the behavior of major browsers.

Unfortunately, the switch to a GET causes our URL validation to actually
download the resources instead of simply probing them via a HEAD.  This
wasn't a big deal when we were only validating thumbnail URLs, but now
that we are validating map URLs, the download size is hundreds of MiB!

One can provide a custom handler to AsyncHttpClient that will abort the
transfer after the headers are received.  However, I found doing that
caused both SourceForge and GitHub to reset the connection quite often,
resulting in spurious build failures.

I ultimately chose to go back to using the Apache HTTP client, which
does preserve the HEAD method upon a redirect.  However, the Apache
client does not support multiple asynchronous requests.  We have a
couple of hundred URLs to validate and running the probes in parallel
reduces the duration of this task by approximately a factor of two.
Therefore, I parallelized the probes using a vanilla thread pool.

The only side effect of this change is that the validation no longer
fails fast.  It collects all the probe results before checking them.  If
this becomes an issue, we can revisit changing the implementation in the
future.

[1] AsyncHttpClient/async-http-client#989
DanVanAtta pushed a commit to triplea-game/triplea that referenced this issue Jun 10, 2017
It was discovered that AsyncHttpClient converts a HEAD request to a GET
after a redirect [1].  Although this behavior is incorrect per RFC2616,
the author made a conscious decision to implement it this way to match
the behavior of major browsers.

Unfortunately, the switch to a GET causes our URL validation to actually
download the resources instead of simply probing them via a HEAD.  This
wasn't a big deal when we were only validating thumbnail URLs, but now
that we are validating map URLs, the download size is hundreds of MiB!

One can provide a custom handler to AsyncHttpClient that will abort the
transfer after the headers are received.  However, I found doing that
caused both SourceForge and GitHub to reset the connection quite often,
resulting in spurious build failures.

I ultimately chose to go back to using the Apache HTTP client, which
does preserve the HEAD method upon a redirect.  However, the Apache
client does not support multiple asynchronous requests.  We have a
couple of hundred URLs to validate and running the probes in parallel
reduces the duration of this task by approximately a factor of two.
Therefore, I parallelized the probes using a vanilla thread pool.

The only side effect of this change is that the validation no longer
fails fast.  It collects all the probe results before checking them.  If
this becomes an issue, we can revisit changing the implementation in the
future.

[1] AsyncHttpClient/async-http-client#989
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant