-
-
Notifications
You must be signed in to change notification settings - Fork 4k
networkd: Alternate address configuration methods for cloud providers #16547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The AWS thing involves IPv4LL and HTTP and is supposed to be quicker than DHCP? That would be sad? So we already have I'd also be fine if we add IPv4FromMAC=, but it needs to be somewhat generic, i.e. maybe take a MAC address bit mask, and a base IP address, so that it is not google cloud specific but can be used somewhat generically:
Or so? The first parameter would specify the bitmask to apply to the MAC address, the second parameter the base address to OR it into. And we should refuse operation if any bits are set in the suffix of that IP address. I guess for the MAC address we need to support masking any bits, i.e. also from the middle, though for the IP address we only have to insert the determined bits to the end of the IP address. Is the AWS logic reasonably standardized? (i.e. does it have a spec, and is it used beyond AWS?) If the latter then we could just add native support to networkd I guess, similar to the existing IPv4LL/DHCP/IPv6RA support. If it's strictly AWS specific and underdocumented I doubt this would be the right place though |
@poettering wrote:
That is indeed sad, and yet I've confirmed it locally. DHCP is more than 100x slower than fetching the IP address from the instance metadata (10ms vs less than 100us). In theory, more aggressive DHCP might be able to close some of that gap, but we're still talking "communicate over virtual network" versus "communicate with locally attached hardware supplying the instance metadata". And it's probably possible to fetch the address from instance metadata faster than I did, too.
That seems reasonable to me. (No need to support non-consecutive bitmasks, at least until something actually needs that.) Perhaps the base IP can default to 0.0.0.0 if not specified (and if the mask contains exactly 32 bits)? For GCP, it'd be
It's documented, and Azure and GCP both have similar instance metadata services, with Azure also supplying IP addresses in instance metadata. To handle this in the simplest fashion that would work, it would suffice to have GCP's metadata service requires an additional HTTP header, but GCP doesn't supply the IP in instance metadata, so that doesn't matter. AWS has an "instance metadata v2" protocol that's more complex, but I think implementing v1 would suffice here. |
Sounds OK for me to have. We link some stuff to libcurl anyway already (importd and some journal remoting stuff), adding some super-basic http get code based around it should be OK. Should be done out-of-process though, i.e. forked off so that we can set up some sandboxing for it, after all it might be used to parse complex stuff like TLS and certificates... And most likely we'll add DoH support to resolved eventually too, thus the dependency on libcurl isn't terribly new...
We have a pretty neat JSON parser in our codebase, so if the new stuff is a bit of JSON that'd be fine too. |
On Tue, Jul 28, 2020 at 03:07:33PM +0000, Lennart Poettering wrote:
> It's documented, and Azure and GCP both have similar instance metadata services, with Azure also supplying IP addresses in instance metadata. To handle this in the simplest fashion that would work, it would suffice to have `IPv4FromURL` and `IPv6FromURL`, http URLs using IP addresses only (no hostnames), with a substitution allowed in the URL for the permanent MAC, and the response must be the IP address in text form. (There are more complex ways to use the instance metadata service, but this would suffice for both AWS and Azure, and `IPv4FromMAC` would suffice for GCP.)
Sounds OK for me to have. We link some stuff to libcurl anyway already (importd and some journal remoting stuff), adding some super-basic http get code based around it should be OK. Should be done out-of-process though, i.e. forked off so that we can set up some sandboxing for it, after all it might be used to parse complex stuff like TLS and certificates... And most likely we'll add DoH support to resolved eventually too, thus the dependency on libcurl isn't terribly new...
This definitely doesn't use TLS or certificates; it's an extremely basic
HTTP connection to a link-local IP address. But yeah, sandboxing is a
good idea. (Would be nice to start the sandbox as early as possible, so
that it doesn't delay bringing up the address by the time for an extra
fork/exec.)
> GCP's metadata service requires an additional HTTP header, but GCP doesn't supply the IP in instance metadata, so that doesn't matter. AWS has an "instance metadata v2" protocol that's more complex, but I think implementing v1 would suffice here.
We have a pretty neat JSON parser in our codebase, so if the new stuff is a bit of JSON that'd be fine too.
It isn't about the format, it'd still be plain text. (Azure can do JSON,
which might be useful to get IPv4 and IPv6 at the same time, but it
isn't required.) If you want to support the instance metadata v2
protocol, you need to send a separate PUT to get a time-limited token,
then pass the token in subsequent requests. If you're using full
libcurl, that'd be straightforward enough. But that variant of the
protocol is *entirely* AWS-specific, so it'd need to be separate from
the baseline `IPv4FromURL` support.
|
Could you provide any references about that? |
Thanks. |
I measured nspawn container startup with various IP configuration options. Test setup: Debian bullseye/sid, systemd 246, /var/lib/machines is a directory, physical network is Ethernet to Google WiFi 1st gen, container image is minbase debootstrap built with packer-builder-nspawn-debootstrap. Test sequence: start Wireshark capture on br0, send pings to container's IP address every millisecond with Total time from start to host0 carrier is consistently around 0.5s. This is a lot, and I didn't dig deeper into what systemd is doing with all that time. Mounting /var/lib/machines to tmpfs made no difference. I couldn't get veth to work with global IPv6 addresses, but with IPv4 using veth instead of bridge also made no difference. Typical time breakdown from one of the runs:
Total time from carrier to first ping reply varied a lot:
Even with static IPv6 configuration ( Enabling ARO (Address Registration Option) from the NDP optimizations RFC 6775 could in theory speed this up, but it isn't implemented in Linux yet (according to Stefan Schmidt's report at LPC 2019 IoT Microconference), and even when it is it might only apply to IPv6 over IEEE 802.15.4. Compared to that, the extra 10-20ms that DHCPv4 adds to container startup looks quaint. There might be lower hanging fruit in systemd-nspawn that could reduce that 500ms start to carrier time to the point where faster IPv4 configuration would begin to make a difference. The ridiculously long time it takes IPv6 stack to initialize makes me sad and wondering if there's anything wrong with my setup. As it stands, it's unsuitable for on-demand containers that get started to serve requests from interactive applications, and wasteful with containers that only need to run for a few seconds at a time as part of a low-frequency compute pipeline. |
On Thu, Oct 15, 2020 at 12:12:01PM -0700, Dmitry Borodaenko wrote:
I measured nspawn container startup with various IP configuration options. Test setup: Debian bullseye/sid, systemd 246, /var/lib/machines is a directory, physical network is Ethernet to Google WiFi 1st gen, container image is minbase debootstrap built with [packer-builder-nspawn-debootstrap](https://git.sr.ht/~angdraug/packer-builder-nspawn-debootstrap/).
...
Compared to that, the extra 10-20ms that DHCPv4 adds to container startup looks quaint. There might be lower hanging fruit in systemd-nspawn that could reduce that 500ms start to carier time to the point where faster IPv4 configuration would begin to make a difference.
The ridiculously long time it takes IPv6 stack to initialize makes me sad and wondering if there's anything wrong with my setup. As it stands, it's unsuitable for on-demand containers that get started to serve requests from interactive applications, and wasteful with containers that only need to run for a few seconds at a time as part of a low-frequency compute pipeline.
I currently have full VMs (including the kernel) starting up in less
time than that; a container really should be substantially faster. But
as for IPv6, I got similar carrier-to-IP-configuration delays on cloud
VMs, and ended up having to disable IPv6 in networkd to improve startup
performance.
It sounds like obtaining IPv6 addresses via instance metadata would be
an even bigger win for IPv6.
|
Only containers running edge services (e.g. Envoy or Nginx) should have global IPv6 addresses. Seed host has privileged access to all containers running on it. Access to Seed hosts is a sensitive security surface that should not be unnecessarily exposed to additional attack vectors. A globally routable IPv6 address is not necessary when Seeds are managed from local network. IPv6 also adds up to 5s to network initialization: systemd/systemd#16547 (comment)
I have another use case for early access to the IMDS. Namely I want to populate the It would be neat if we could set up a route to the link local address of the metadata server in early boot (Maybe udev's Then we can have a |
Based on the today's discussion and https://gist.github.com/arianvp/22e1c5182eb6c17bbd8c1bbe823b516b, how about the following?
systemd-netns create
systemd-netns delete
This may be useful to run commands (e.g. curl or wget) with NetworkNamespacePath=/run/systemd/netns/netns99, e.g. We can share many code from networkd, so I guess it is not hard to implement such. |
I'd like to bring up the network as fast as possible; every millisecond counts. On some cloud providers, there are faster ways of obtaining an address, rather than sending out a DHCP request. I understand and agree with networkd's general policy of not supporting hooks, so I'd like to request first-class support for these as address assignment methods within networkd. (I wouldn't expect any of these to be in any default configuration, just available options to use in a
.network
file.)On some cloud providers (such as Google Cloud), an IPv4 address is available by using the low four bytes of the interface's hardware MAC address. I'd like to have a option along the lines of
IPv4FromMAC
that implements this. This is the simple case, and I'm hoping it'd be trivial to support.On other cloud providers (such as AWS), it's possible to get the IPv4 and IPv6 addresses for all interfaces from instance metadata, which would involve bringing up link-local addressing (169.254.x.x) and then fetching a specified URL (
http://169.254.169.254/latest/meta-data/network/interfaces/macs/$MAC/local-ipv4s
andhttp://169.254.169.254/latest/meta-data/network/interfaces/macs/$MAC/ipv6s
). For these, the configuration option would specify the URL (e.g.IPv4FromURL
andIPv6FromURL
). I recognize that this would be a larger ask; if this doesn't seem reasonable to do within networkd, I'd be happy to hear suggestions for other ways to implement this, other than running an entirely custom network bring-up daemon.The text was updated successfully, but these errors were encountered: