Switch to GET on 301 redirect too #989

slandelle · 2015-10-06T14:16:55Z

Wrong as per RFC2616, but that's what browsers do...

* Update Head302Test to show it is actually failing This test had the following defects which caused it to falsely pass: * Jetty requires a handler to explicitly indicate it has handled the request by calling Request.setHandled(true). Otherwise the server will return 404. * The test did not configure the client to follow redirects. * The test itself wasn't asserting anything useful beyond that the request did not time out. To ensure the redirect was followed, it needs to assert the expected response status code from the redirected location is received. * Fix broken Head302Test The test now verifies that a HEAD redirected via 302 is switched to GET per [1]. [1] #989

It was discovered that AsyncHttpClient converts a HEAD request to a GET after a redirect [1]. Although this behavior is incorrect per RFC2616, the author made a conscious decision to implement it this way to match the behavior of major browsers. Unfortunately, the switch to a GET causes our URL validation to actually download the resources instead of simply probing them via a HEAD. This wasn't a big deal when we were only validating thumbnail URLs, but now that we are validating map URLs, the download size is hundreds of MiB! One can provide a custom handler to AsyncHttpClient that will abort the transfer after the headers are received. However, I found doing that caused both SourceForge and GitHub to reset the connection quite often, resulting in spurious build failures. I ultimately chose to go back to using the Apache HTTP client, which does preserve the HEAD method upon a redirect. However, the Apache client does not support multiple asynchronous requests. We have a couple of hundred URLs to validate and running the probes in parallel reduces the duration of this task by approximately a factor of two. Therefore, I parallelized the probes using a vanilla thread pool. The only side effect of this change is that the validation no longer fails fast. It collects all the probe results before checking them. If this becomes an issue, we can revisit changing the implementation in the future. [1] AsyncHttpClient/async-http-client#989

slandelle added Defect AHC2 labels Oct 6, 2015

slandelle self-assigned this Oct 6, 2015

slandelle added this to the 2.0.0 milestone Oct 6, 2015

slandelle closed this as completed in fddb637 Oct 6, 2015

ssoloff mentioned this issue Jun 9, 2017

Fix broken Head302Test #1421

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to GET on 301 redirect too #989

Switch to GET on 301 redirect too #989

slandelle commented Oct 6, 2015

Switch to GET on 301 redirect too #989

Switch to GET on 301 redirect too #989

Comments

slandelle commented Oct 6, 2015