• Bug#1104426: The core issue is cloud-init on IPv6-only networks

    From Noah Meyerhans@21:1/5 to Zar VPN on Sun May 4 05:10:01 2025
    Control: tags -1 + upstream
    Control: forwarded -1 https://github.com/canonical/cloud-init/issues/6205

    On Sat, May 03, 2025 at 12:25:01PM +0330, Zar VPN wrote:
    This is a critical issue, as it prevents users from booting and
    configuring instances in modern IPv6-only cloud environments using the
    official Debian cloud image.

    I can reproduce this issue, but I don't think it is limited to Debian.
    It seems that it's either cloud-init itself or python's HTTP client
    (urllib3 and/or requests).

    cloudinit/sources/DataSourceOpenStack.py defines a function wait_for_metadata_service(). This contains the default list of IMDS endpoints:

    DEF_MD_URLS = [
    "http://[fe80::a9fe:a9fe%25{iface}]".format(
    iface=self.distro.fallback_interface
    ),
    "http://169.254.169.254",
    ]
    urls = self.ds_cfg.get("metadata_urls", DEF_MD_URLS)

    It constructs a list of URLs to probe when looking for a functioning
    IMDS endpoint by appending the "openstack" path to the default list of endpoints, as well as any passed in the configuration:

    for url in urls:
    md_url = url_helper.combine_url(url, "openstack")
    md_urls.append(md_url)

    It then probes those endpoints:

    avail_url, _response = url_helper.wait_for_url(
    urls=md_urls,
    max_wait=url_params.max_wait_seconds,
    timeout=url_params.timeout_seconds,
    connect_synchronously=False,
    )

    However, it doesn't actually seem to be able to successfully probe a
    link-local endpoint at all. We can test this ourselves by constructing
    a simplified test case:

    noahm@foo:~$ cat /tmp/t.py
    #!/usr/bin/python3

    from cloudinit import url_helper
    url="http://[fe80::a9fe:a9fe%enp0s1]"
    md_url = url_helper.combine_url(url, "openstack")
    md_urls=[md_url]
    print(url_helper.wait_for_url(md_urls, max_wait=5, timeout=1))

    noahm@foo:~$ python3 /tmp/t.py
    (False, None)

    Both the server logs and tcpdump show no request is ever issued to the
    given URL.

    But if we change that to use a globally scoped address, it works:
    noahm@foo:~$ cat /tmp/t.py
    #!/usr/bin/python3

    from cloudinit import url_helper
    # url="http://[fe80::a9fe:a9fe%enp0s1]" url="http://[fd00:80db:0:5:34e5:8aff:fec5:b9bf]"
    md_url = url_helper.combine_url(url, "openstack")
    md_urls=[md_url]
    print(url_helper.wait_for_url(md_urls, max_wait=5, timeout=1))

    noahm@foo:~$ python3 /tmp/t.py ('http://[fd00:80db:0:5:34e5:8aff:fec5:b9bf]/openstack', b'<!doctype html>\n<html>\n<head>\n <title>untitled</title>\n</head>\n<body>\n</body>\n</html>\n')

    And to be sure, the server does reply to queries on link-local
    addresses:

    noahm@foo:~$ curl -v 'http://[fe80::a9fe:a9fe%enp0s1]/openstack'
    * Trying [fe80::a9fe:a9fe]:80...
    * Connected to fe80::a9fe:a9fe (fe80::a9fe:a9fe) port 80
    * using HTTP/1.x
    GET /openstack HTTP/1.1
    Host: [fe80::a9fe:a9fe]
    User-Agent: curl/8.13.0
    Accept: */*

    < HTTP/1.1 301 Moved Permanently
    < Server: nginx/1.22.1
    < Date: Sun, 04 May 2025 02:43:32 GMT
    < Content-Type: text/html
    < Content-Length: 169
    < Location: http://[fe80::a9fe:a9fe]/openstack/
    < Connection: keep-alive
    <
    <html>
    <head><title>301 Moved Permanently</title></head>
    <body>
    <center><h1>301 Moved Permanently</h1></center> <hr><center>nginx/1.22.1</center>
    </body>
    </html>
    * Connection #0 to host fe80::a9fe:a9fe left intact

    We can also see evidence suggesting that something is wrong in
    cloud-init from the logs you provided:

    2025-04-30 09:59:03,739 - url_helper.py[DEBUG]: [0/1] open 'http://[fe80::a9fe:a9fe%25enp3s0]/openstack' with {'url': 'http://[fe80::a9fe:a9fe%25enp3s0]/openstack', 'stream': False, 'allow_redirects': True, 'method': 'GET', 'timeout': 10.0, 'headers': {'
    User-Agent': 'Cloud-Init/25.1.1'}} configuration
    2025-04-30 09:59:03,893 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'url': 'http://169.254.169.254/openstack', 'stream': False, 'allow_redirects': True, 'method': 'GET', 'timeout': 10.0, 'headers': {'User-Agent': 'Cloud-
    Init/25.1.1'}} configuration
    2025-04-30 09:59:03,895 - url_helper.py[DEBUG]: Exception(s) [UrlError('HTTPConnectionPool(host=\'fe80::a9fe:a9fe%25enp3s0\', port=80): Max retries exceeded with url: /openstack (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at
    0x7fb44b82fe00>: Failed to resolve \'fe80::a9fe:a9fe%25enp3s0\' ([Errno -2] Name or service not known)"))'), UrlError("HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /openstack (Caused by NewConnectionError('<urllib3.
    connection.HTTPConnection object at 0x7fb44b6d9810>: Failed to establish a new connection: [Errno 101] Network is unreachable'))")] during request to http://169.254.169.254/openstack, raising last exception
    2025-04-30 09:59:03,895 - url_helper.py[DEBUG]: Calling 'http://169.254.169.254/openstack' failed [0/-1s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /openstack (Caused by NewConnectionError('<
    urllib3.connection.HTTPConnection object at 0x7fb44b6d9810>: Failed to establish a new connection: [Errno 101] Network is unreachable'))]
    2025-04-30 09:59:03,895 - DataSourceOpenStack.py[DEBUG]: Giving up on OpenStack md from ['http://[fe80::a9fe:a9fe%25enp3s0]/openstack', 'http://169.254.169.254/openstack'] after 0 seconds
    2025-04-30 09:59:03,895 - log_util.py[WARNING]: No active metadata service found
    2025-04-30 09:59:03,895 - log_util.py[DEBUG]: No active metadata service found

    Note in particular this:

    Exception(s) [UrlError('HTTPConnectionPool(host=\'fe80::a9fe:a9fe%25enp3s0\', port=80): Max retries exceeded with url: /openstack (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7fb44b82fe00>: Failed to resolve \'fe80::a9fe:
    a9fe%25enp3s0\' ([Errno -2] Name or service not known)"))')

    There shouldn't be any name resolution involved here at all. My guess
    is that something is not recognizing the scoped link-local address as an
    IP address, and is treating it as a hostname that needs to be resolved
    in DNS. Which is obviously going to fail. I haven't looked deeply
    enough to determine whether this is cloud-init or a lower-level http
    client.

    noah

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bastian Blank@21:1/5 to Noah Meyerhans on Sun May 4 08:10:01 2025
    On Sat, May 03, 2025 at 10:57:39PM -0400, Noah Meyerhans wrote:
    There shouldn't be any name resolution involved here at all. My guess
    is that something is not recognizing the scoped link-local address as an
    IP address, and is treating it as a hostname that needs to be resolved
    in DNS. Which is obviously going to fail. I haven't looked deeply
    enough to determine whether this is cloud-init or a lower-level http
    client.

    "requests" quotes the whole url, so undoes the fixup for "%25" to "%".

    urllib3 then does not de-quote the hostname, so "%25" is given to "getaddrinfo".

    Bastian

    --
    The sight of death frightens them [Earthers].
    -- Kras the Klingon, "Friday's Child", stardate 3497.2

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bastian Blank@21:1/5 to Bastian Blank on Sun May 4 08:30:02 2025
    On Sun, May 04, 2025 at 08:05:00AM +0200, Bastian Blank wrote:
    On Sat, May 03, 2025 at 10:57:39PM -0400, Noah Meyerhans wrote:
    There shouldn't be any name resolution involved here at all. My guess
    is that something is not recognizing the scoped link-local address as an
    IP address, and is treating it as a hostname that needs to be resolved
    in DNS. Which is obviously going to fail. I haven't looked deeply
    enough to determine whether this is cloud-init or a lower-level http client.

    "requests" quotes the whole url, so undoes the fixup for "%25" to "%".

    urllib3 then does not de-quote the hostname, so "%25" is given to "getaddrinfo".

    A bit further:

    requests.adapters._urllib3_request_context uses urllib.parse.urlparse,
    which then returns the quoted form, but without []. urllib3.connectionpool._normalize_host will only dequote if it finds the
    [] before removing them.

    Bastian

    --
    The face of war has never changed. Surely it is more logical to heal
    than to kill.
    -- Surak of Vulcan, "The Savage Curtain", stardate 5906.5

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)