Table of Contents
This is a proposal to restructure the HTTP client code in puppet to solve the following problems.
It's difficult to use puppet as a library to call our own REST APIs due to the
coupling of puppet's http code with the indirector. As a result, users have
created REST clients, but they don't behave the same way our agent does, such as
serialization and deserialization of rich data, server_list
for high
availability, and JSON to PSON content negotiation, etc.
It would be beneficial to the puppet ecosystem to have an REST client that's reusable by more than the agent.
Persistent HTTP connections allow puppet to establish an HTTP(S) connection once
and reuse it for multiple HTTP requests. This avoids making a new TCP connection
and SSL handshake for each request. This is important for pluginsync, due to the
large number of individual GET requests. However, persistent connections are not
enabled by default, and must be opted into, as was recently done for puppet device
and puppet plugin download
. More than likely, other applications
should be using persistent connections, but aren't.
Puppet supports 3 ways of routing connections: DNS SRV records, server list, and
static puppet settings. However, some routing methods are not consistently
applied. For example, puppet plugin download
and puppet report upload
don't
observe server list.
Once a route has been determined, puppet stores the last used server and port in Puppet's context system, but it's more of a hack than anything. As a result, it's difficult to know how the last used server and port were set and when to invalidate them.
Puppet::Network::HTTP::Connection
supports two ways of making GET and POST
requests, but they don't behave consistently when handling HTTP redirects, the
Retry-After
header, server and proxy authentication, and exception handling.
The Puppet::Network::HttpPool
and related classes don't specify which
exceptions can be raised. Instead they pass through whatever exceptions ruby
raises. Everything from SocketError
to SystemCallError
to
OpenSSL::SSL::SSLError
to Net::ProtocolError
and TimeoutError
. As a
result, it's hard for clients to build higher level abstractions.
Puppet only trusts the puppet PKI when connecting to puppet infrastructure, but
needs to additionally trust the system cert store for requests like PMT and
downloading files from https sources. However, the current API doesn't allow the
caller to do that, which is why Puppet::Util::HttpProxy#request_with_redirects
duplicates the logic fromPuppet::Network::HTTP::Connection#request_with_redirects
.
In order to solve these problems, I propose creating an HTTP client in puppet with the following goals:
- Implement a REST client in puppet capable of serializing/deserializing puppet objects like Catalog, Report, etc.
- Reuse the existing networking code as much as possible, such as
Puppet::Network::HTTP::Pool
, but restructure it with a clear API. - Always use persistent connections unless the caller explicitly opts out.
- Handle server resolution (via DNS SRV, etc) in a consistent way.
- Define an exception hierarchy for the API so that
Net::HTTP
specific exceptions don't leak out. - Make it possible to use the system trust store for a single HTTPS request.
- Ruby's builtin
Net::HTTP
library is fairly buggy, however, we're not switching away from it right now. We may in the future, but it's out of scope. - Serialization of puppet domain objects requires pops, rich data and loaders. As a result, creating a standalone puppet-http gem is out of scope.
Has a pool of persistent HTTP connections and creates HTTP sessions. Closes persistent connections when its close method is called.
Has low-level HTTP methods, such as get
, post
, etc which take the path,
headers, options, and allow the caller to stream the request and response body.
Returns Puppet::HTTP::Response
with the response code, etc.
Maintains the pool of persistent Net::HTTP
connections, keeping track of when
idle connections expire. The with_connection
method takes a block, which
ensures borrowed connections are always returned to the pool.
Defines a route to a REST service. Includes the API prefix, DNS SRV service name, and puppet server and port settings for that service.
Represents an instance of a puppet web service. Includes the URL used to connect
to the service, such as https://puppet:8140/puppet/v3
. There are four
services: ca
, report
, fileserver
, and the default puppet
.
The ca
and report
services handle certs and reports, respectively. The
fileserver
service handles puppet file metadata and content requests, such as
pluginsync and file resources with source => 'puppet://'
. The puppet
service
handles nodes, facts, and catalogs, and is also the fallback for the other three
services.
Each service is responsible for serializing/deserializing the HTTP entity into a
domain object. It uses the existing Puppet::Network::Format
code to do so.
Each resolver represents a different strategy for resolving a service name into a list of candidate servers and ports.
Represents an HTTP session through which services may be connected to and accessed.
Has a Session#route_to
method to route to a web service based on the requested
service name and client configuration:
client = Puppet::HTTP::Client.new
session = client.create_session
service = session.route_to(:ca)
cert = service.get_certificate('foo')
puts "Retrieved cert #{cert.subject.to_utf8} from #{service.url}"
The Session#route_to(:ca)
method (above) returns an instance of
Puppet::HTTP::Service::Ca
, which has methods appropriate for that type of
service. All services extend Puppet::HTTP::Service
.
If an explicit server and port are specified on the command line or
configuration, such as puppet agent -t --server foo.example.com
, then the
Session#route_to
method will always return a Service
with that host and port.
Otherwise, the session will walk the list of resolvers in priority order:
- DNS SRV
- Server list
- Puppet server/port settings
If the route_to
method attempts to connect to a service, but it results in an
exception, such as "connection refused", then the session will attempt the next
service.
If the caller successfully uses a service, then the session will return the same
service the next time route_to
is called again.
The DNS SRV resolver performs an SRV lookup, and randomly selects one of the targets based on the weight of each entry in the SRV record. A target with weight 2 would be twice as likely to be chosen as a target with weight 1.
client = Puppet::HTTP::Client.new(use_srv: true, srv_domain: 'puppet.example.com')
session = client.create_session
service = session.route_to(:ca)
# service.url is "https://compiler1.puppet.example.com:8140"
The server list resolver selects the first available server using puppetserver's
simple status endpoint. This applies when routing requests to the :puppet
service, as well as any service whose server and port are the same as the
:puppet
service. For example, when :ca_server
and :report_server
have not
been overridden.
client = Puppet::HTTP::Client.new(server_list: ['compiler1', 'compiler2'])
session = client.create_session
service = session.route_to(:puppet)
# service.url is "https://compiler1:8140"
The resolver selects a route based on the puppet settings for that service:
service | server setting | port setting |
---|---|---|
ca | ca_server | ca_port |
fileserver | server | serverport |
report | report_server | report_port |
puppet | server | serverport |
For example, route_to(:report)
would use Puppet[:report_server]
and
Puppet[:report_port]
.
There are some variations in how the different services are routed. Here is a visual of how the CA service is routed. We have to preserve some interesting behavior with this service, but otherwise the flow is similar to that of other services.
Puppet agents support downloading file content from 3rd party file servers,
which reduces load on the compiler. The Client
will provide a low-level API
for making GET
requests for an arbitrary URL, and streaming the response body.
Puppet only trusts the puppet PKI for its REST requests. However, it should be possible to additionally trust the system store when making HTTPS requests:
client = Puppet::HTTP::Client.new
response = client.get("https://artifactory.example.com/java.tar.gz", options: { include_system_store: true })
response.read_body do |data|
puts "Read #{data.bytes}"
end
Puppet ruby code running in puppetserver sometimes make outbound connections such as the puppetdb terminus, PE classifier terminus, and 'http' report processor. Currently, puppetserver registers its own http client class, so that it can perform the HTTP request using Apache HttpClient.
In order to preserve this capability, puppetserver should have a way of
overriding the get
and post
methods of Puppet::HTTP::Client
to call the
Apache HttpClient instead.
One way might be to create an adapter that overrides Puppet's implementation and delegates to puppetserver's client:
class Puppet::Server::HttpClientAdapter < Puppet::HTTP::Client
def initialize(http_client)
super
@http_client = http_client
end
def get(url, headers={}, options={})
@http_client.get(url, headers, options)
end
# etc
end
And register it with puppet:
Puppet.push_context(http_client: HttpClientAdapter.new(Puppet::Server::HttpClient.new))