From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-148085-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 6603 invoked by alias); 8 Jun 2018 17:47:59 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 6586 invoked by uid 89); 8 Jun 2018 17:47:58 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.1 required=5.0 tests=AWL,BAYES_00,KAM_SHORT,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=connects
X-HELO: mx1.redhat.com
Received: from mx3-rdu2.redhat.com (HELO mx1.redhat.com) (66.187.233.73) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 08 Jun 2018 17:47:55 +0000
Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id 67EC2406F619;	Fri,  8 Jun 2018 17:47:54 +0000 (UTC)
Received: from localhost (unused-10-15-17-196.yyz.redhat.com [10.15.17.196])	by smtp.corp.redhat.com (Postfix) with ESMTP id 2FBD920357CA;	Fri,  8 Jun 2018 17:47:54 +0000 (UTC)
From: Sergio Durigan Junior <sergiodj@redhat.com>
To: Pedro Alves <palves@redhat.com>
Cc: GDB Patches <gdb-patches@sourceware.org>,  Eli Zaretskii <eliz@gnu.org>,  Jan Kratochvil <jan.kratochvil@redhat.com>,  Paul Fertser <fercerpav@gmail.com>,  Tsutomu Seki <sekiriki@gmail.com>
Subject: Re: [PATCH] Implement IPv6 support for GDB/gdbserver
References: <20180523185719.22832-1-sergiodj@redhat.com>	<307a63d3-703d-5611-1508-c80daa86fbbf@redhat.com>	<874lieulko.fsf@redhat.com>	<8721b020-3b0e-bd66-85dc-5e28aef456a8@redhat.com>
Date: Fri, 08 Jun 2018 17:47:00 -0000
In-Reply-To: <8721b020-3b0e-bd66-85dc-5e28aef456a8@redhat.com> (Pedro Alves's	message of "Fri, 8 Jun 2018 14:53:38 +0100")
Message-ID: <87vaattbjq.fsf@redhat.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-IsSubscribed: yes
X-SW-Source: 2018-06/txt/msg00228.txt.bz2

On Friday, June 08 2018, Pedro Alves wrote:

> On 06/08/2018 02:13 AM, Sergio Durigan Junior wrote:
>> On Wednesday, June 06 2018, Pedro Alves wrote:
>
>
>>>> Another thing worth mentioning is the new 'GDB_TEST_IPV6' testcase
>>>> parameter, which instructs GDB and gdbserver to use IPv6 for
>>>> connections.  This way, if you want to run IPv6 tests, you do:
>>>>
>>>>   $ make check-gdb RUNTESTFLAGS='GDB_TEST_IPV6=1'
>>>
>>> That sounds useful, but:
>>>
>>> #1 - I don't see how that works without also passing
>>>      --target_board= pointing at one of the native-gdbserver and
>>>      native-extended-gdbserver board files.  
>>>      Can you expand on why you took this approach instead of:
>>>  
>>>   a) handling GDB_TEST_IPV6 somewhere central, like
>>>      in gdb/testsuite/gdbserver-support.exp, where we
>>>      default to "localhost:".  That would exercise the gdb.server/
>>>      tests with ipv6, when testing with the default/unix board file.
>>>
>>>   b) add new board files to test with ipv6, like native-gdbserver-v6
>>>      or something like that.
>>>
>>>   c) both?
>> 
>> I was thinking about a good way to test this feature, and my initial
>> assumption was that the test would only make sense when --target-board=
>> is passed.  That's why I chose to implement the mechanism on
>> gdb/testsuite/boards/gdbserver-base.exp.  Now that you mentioned this, I
>> noticed that I should have also mentioned these expectations while
>> writing the commit message, and that the "make check-gdb
>> RUNTESTFLAGS='GDB_TEST_IPV6=1'" is actually wrong because it doesn't
>> specify any of the target boards.
>> 
>> Having said that, and after reading your question, I understand that the
>> testing can be made more flexible by implementing the logic inside
>> gdb/testsuite/gdbserver-support.exp instead, which will have the benefit
>> of activating the test even without a gdbserver target board being
>> specified.  I will give it a try and see if I can implement it in a
>> better way.
>
> I'd think you just have to hook the GDB_TEST_LOCALHOST env var reading here,
> in gdbserver_start:
>
>     # Extract the local and remote host ids from the target board struct.
>     if [target_info exists sockethost] {
> 	set debughost [target_info sockethost]
>     } else {
> 	set debughost "localhost:"
>     }

Yes, that's my plan.

> I'd also try removing the
>
>   set_board_info sockethost "localhost:"
>
> line from native-gdbserver.exp and native-extended-gdbserver.exp,
> since that's the default.  But it's not really necessary if 
> the env var takes precedence of the target board setting.

I wasn't planning to remove these lines from the board files, but I can
do it.

>>> Does connecting with "localhost6:port" default to IPv6, BTW?
>>> At least fedora includes "localhost6" in /etc/hosts.
>> 
>> Using "localhost6:port" works, but it doesn't default to IPv6.  Here's
>> what I see on the gdbserver side:
>> 
>>   $ ./gdb/gdbserver/gdbserver --once localhost6:1234 a.out
>>   Process /path/to/a.out created; pid = 7742
>>   Listening on port 1234
>>   Remote debugging from host ::ffff:127.0.0.1, port 39196
>> 
>> This means that the connection came using IPv4; it works because IPv6
>> sockets also listen for IPv4 connection on Linux (one can change this
>> behaviour by setting the "IPV6_V6ONLY" socket option).
>> 
>> This happens because I've made a decision to default to AF_INET (instead
>> of AF_UNSPEC) when no prefix has been given.  This basically means that,
>> at least for now, we assume that an unknown (i.e., not prefixed)
>> address/hostname is IPv4.  I've made this decision thinking about the
>> convenience of the user: when AF_UNSPEC is used (and the user hasn't
>> specified any prefix), getaddrinfo will return a linked list of possible
>> addresses that we should try to connect to, which usually means an IPv6
>> and an IPv4 address, in that order.  Usually this is fine, because (as I
>> said) IPv6 sockets can also listen for IPv4 connections.  However, if
>> you start gdbserver with an explicit IPv4 address:
>> 
>>   $ ./gdb/gdbserver/gdbserver --once 127.0.0.1:1234 a.out
>> 
>> and try to connect GDB to it using an "ambiguous" hostname:
>> 
>>   $ ./gdb/gdb -ex 'target remote localhost:1234' a.out
>> 
>> you will notice that GDB will take a somewhat long time trying to
>> connect (to the IPv6 address, because of AF_UNSPEC), and then it will
>> error out saying that the connection timed out:
>> 
>>   tcp:localhost:1234: Connection timed out.
>
> How do other tools handle this?

Just like GDB.


> For example, with ping, I get:
>
>  $ ping localhost
>  PING localhost.localdomain (127.0.0.1) 56(84) bytes of data.
>  64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=1 ttl=64 time=0.048 ms
>  ^C
>
>  $ ping localhost6
>  PING localhost6(localhost6.localdomain6 (::1)) 56 data bytes
>  64 bytes from localhost6.localdomain6 (::1): icmp_seq=1 ttl=64 time=0.086 ms
>  ^C

And I get:

  $ ping localhost
  PING localhost(localhost (::1)) 56 data bytes
  64 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.050 ms
  ^C

  $ ping localhost6
  PING localhost6(localhost (::1)) 56 data bytes
  64 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.089 ms
  ^C

Maybe your /etc/hosts is different than mine:

  127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
  ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

> how does ping instantly know without visible delay that "localhost"
> resolves to an IPv4 address, and that "localhost6" resolves to
> an IPv6 address?

It doesn't.  It just tries to connect to the first entry returned by
getaddrinfo:

  https://github.com/iputils/iputils/blob/master/ping.c#L518

In my case, since both IPv4 and IPv6 localhost addresses are valid, it
just connects to the first one, which is the IPv6.

> Same with telnet:
>
>  $ telnet localhost
>  Trying 127.0.0.1...
>  telnet: connect to address 127.0.0.1: Connection refused
>  $ telnet localhost6
>  Trying ::1...
>  telnet: connect to address ::1: Connection refused

Using telnet allows one to see the algorithm I explained before working:

  $ telnet localhost
  Trying ::1...
  telnet: connect to address ::1: Connection refused
  Trying 127.0.0.1...
  telnet: connect to address 127.0.0.1: Connection refused

It tried to connect to the IPv6 address, got a connection refused (which
happened instantly, without delay, because telnet doesn't implement our
tcp-retry thing), then it tried to connect to the IPv4 address, which
also failed.  This is the getaddrinfo loop working.

  https://github.com/marado/netkit-telnet/blob/596f1a79d3868f733fc15765b107638822a0f8a9/netkit-telnet-0.17/telnet/commands.cc#L1808

> Same with netcat:
>
>  $ nc -vv localhost
>  Ncat: Version 7.60 ( https://nmap.org/ncat )
>  NCAT DEBUG: Using system default trusted CA certificates and those in /usr/share/ncat/ca-bundle.crt.
>  NCAT DEBUG: Unable to load trusted CA certificates from /usr/share/ncat/ca-bundle.crt: error:02001002:system library:fopen:No such file or directory
>  libnsock nsock_iod_new2(): nsock_iod_new (IOD #1)
>  libnsock nsock_connect_tcp(): TCP connection requested to 127.0.0.1:31337 (IOD #1) EID 8
>                                                            ^^^^^^^^^
>  libnsock nsock_trace_handler_callback(): Callback: CONNECT ERROR [Connection refused (111)] for EID 8 [127.0.0.1:31337]
>  Ncat: Connection refused.
>
>  $ nc -vv localhost6
>  Ncat: Version 7.60 ( https://nmap.org/ncat )
>  NCAT DEBUG: Using system default trusted CA certificates and those in /usr/share/ncat/ca-bundle.crt.
>  NCAT DEBUG: Unable to load trusted CA certificates from /usr/share/ncat/ca-bundle.crt: error:02001002:system library:fopen:No such file or directory
>  libnsock nsock_iod_new2(): nsock_iod_new (IOD #1)
>  libnsock nsock_connect_tcp(): TCP connection requested to ::1:31337 (IOD #1) EID 8
>                                                            ^^^
>  libnsock nsock_trace_handler_callback(): Callback: CONNECT ERROR [Connection refused (111)] for EID 8 [::1:31337]
>  Ncat: Connection refused.
> [

  $ nc -vv localhost
  Ncat: Version 7.60 ( https://nmap.org/ncat )
  NCAT DEBUG: Using system default trusted CA certificates and those in /usr/share/ncat/ca-bundle.crt.
  NCAT DEBUG: Unable to load trusted CA certificates from /usr/share/ncat/ca-bundle.crt: error:02001002:system library:fopen:No such file or directory
  libnsock nsock_iod_new2(): nsock_iod_new (IOD #1)
  libnsock nsock_connect_tcp(): TCP connection requested to ::1:31337 (IOD #1) EID 8
                                                            ^^^
  libnsock nsock_trace_handler_callback(): Callback: CONNECT ERROR [Connection refused (111)] for EID 8 [::1:31337]
  Ncat: Connection to ::1 failed: Connection refused.
  Ncat: Trying next address...
        ^^^^^^^^^^^^^^^^^^^
  libnsock nsock_connect_tcp(): TCP connection requested to 127.0.0.1:31337 (IOD #1) EID 16
                                                            ^^^^^^^^^
  libnsock nsock_trace_handler_callback(): Callback: CONNECT ERROR [Connection refused (111)] for EID 16 [127.0.0.1:31337]
  Ncat: Connection refused.

> BTW, I think a much more common scenario of local use of
> gdbserver is to omit the host name:
>
>  ./gdb/gdbserver/gdbserver --once :1234
>  ./gdb/gdb -ex 'target remote :1234'
>
> I assume that would work fine with AF_UNSPEC ?

Yes, with AF_UNSPEC we assume the hostname is "localhost" if it's not
given, and it works like it should (i.e., IPv6 first, IPv4 later).

>> 
>> This is because of the auto-retry mechanism implemented for TCP
>> connections on GDB; it keeps retrying to connect to the IPv6 until it
>> decides it's not going to work.  Only after this timeout is that GDB
>> will try to connect to the IPv4 address, and succeed.
>> 
>> So, the way I see it, we have a few options to deal with this scenario:
>> 
>> 1) Assume that the unprefixed address/hostname is AF_INET (i.e., keep
>> the patch as-is).
>> 
>> 2) Don't assume anything about the unprefixed address/hostname (i.e.,
>> AF_UNSPEC), and don't change the auto-retry system.  This is not very
>> nice because of what I explained above.
>> 
>> 3) Don't assume anything about the unprefixed address/hostname (i.e.,
>> AF_UNSPEC), but *DO* change the auto-retry system to retry less times
>> (currently it's set to 15 retries, which seems too much to me).  Maybe 5
>> times is enough?  This will still have an impact on the user, but she
>> will have to wait less time, at least.
>> 
>> Either (1) or (3) are fine by me.  If we go with (1), we'll eventually
>> need to change the default to IPv6 (or to AF_UNSPEC), but that's only
>> when IPv6 is more adopted.
>
> I'd like to understand this a bit more before coming up with a
> decision.  I feel like we're missing something.
>
> A part of it is that it kind of looks like a "it hurts when I do this
> doctor; then just don't" scenario, with the using different host names
> on both gdbserver and gdb (localhost vs 127.0.0.1).  Why would you do that
> for local debugging?  You'd get the same problem if localhost
> always only resolved to an IPv6 address, I'd think.  But still, I'd
> like to understand how can other tools handle this.

The "getaddrinfo loop" is a well known way to implement IPv6 support on
IPv4-only tools.  I think it is totally fine to iterate through the
possible addresses and try each one until we have success, but our
problem is that we implemented a retry mechanism on top of that, so when
we get "connection refused" GDB won't just try the next address, but
will keep retrying the refused one...  That is the problem.

I don't know why would anyone use different hostnames on both GDB and
gdbserver, I just stated the fact that if someone does it, she will have
problems.  And yes, you'd get the same problem if localhost always only
resolved to IPv6.  The difference is that the tools you're using for
your example don't implement retry (or at least not the way we do), so
you don't have huge delays when they can't connect to an address.

It is clear to me, after investigating this, that the problem is our
retry mechanism.  We can either adjust it to a lower delay, get rid of
it, or leave it as is and assume that unprefixed addresses are IPv4.  I
fail to see what else we're missing.

>>>> +  char *orig_name = strdup (name);
>>>
>>> Do we need a deep copy?  And if we do, how about
>>> using std::string to avoid having to call free further
>>> down?
>> 
>> This is gdbserver/gdbreplay.c, where apparently we don't have access to
>> a lot of our regular facilities on GDB.  For example, I was trying to
>> use std::string, its methods, and other stuff here (even i18n
>> functions), but the code won't compile, and as far as I have researched
>> this is intentional, because gdbreplay needs to be a very small and
>> simple program.  
>
> What did you find that gave you that impression?  There's no reason
> that gdbreplay needs to be small or simple.  Certainly doesn't need
> to be smaller than gdbserver.

First, the way it is written.  It doesn't use any of our facilities
(e.g., i18n, strdup instead of xstrdup), and it seems to be treated in a
"special" way, because it is a separate program.  I found this message:

  https://sourceware.org/ml/gdb/2008-06/msg00117.html

  > I've tried to find information in the doc about gdbreplay without luck.
  > Really quickly, does gdbreplay, as its name suggest, allow to record an
  > re-run an application session? 

  Yes, exactly -- but with rather stringent limits.
  In a nutshell, during the replay session, you must give
  EXACTLY the same sequence of gdb commands as were given
  during the record session.  gdbreplay will prompt you for
  the next command, but if you do *anything* different, 
  it will throw up its hands and quit.

And it seems to imply that gdbreplay is a very limited program.  And
Jan's first patch (back in 2006) implementing IPv6 also duplicated code
on gdbreplay.  I admit I may have read too much between the lines here,
but I just assumed that this was just the way things were.

>> at least that's what I understood from our
>> archives/documentation.  I did not feel confident reworking gdbreplay to
>> make it "modern", so I decided to implement things "the old way".
>
> Seems like adding to technical debt to be honest.  Did you hit some
> unsurmountable problem, or would just a little bit of fixing here and
> there be doable?

I don't know if it's unsurmountable.  I know I had trouble getting i18n
and trying to include a few headers here and there, but I haven't tried
very hard to work around it.  I just decided to "add to the technical
debt".

I'll take a better look at this.

Thanks,

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/