From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <simark@simark.ca>
Received: from simark.ca (simark.ca [158.69.221.121])
 by sourceware.org (Postfix) with ESMTPS id 17CD0388A826
 for <gdb-patches@sourceware.org>; Thu, 16 Jul 2020 13:28:54 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 17CD0388A826
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=simark.ca
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=simark@simark.ca
Received: from [10.0.0.11] (173-246-6-90.qc.cable.ebox.net [173.246.6.90])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by simark.ca (Postfix) with ESMTPSA id BD7291E794;
 Thu, 16 Jul 2020 09:28:52 -0400 (EDT)
Subject: Re: [PATCH v3 1/3] arc: Add ARCv2 XML target along with refactoring
To: Shahab Vahedi <shahab.vahedi@gmail.com>
Cc: gdb-patches@sourceware.org, Shahab Vahedi <shahab@synopsys.com>,
 Tom Tromey <tom@tromey.com>, Anton Kolesov <anton.kolesov@synopsys.com>,
 Francois Bedard <fbedard@synopsys.com>
References: <20200428160437.1585-1-shahab.vahedi@gmail.com>
 <20200713154527.13430-1-shahab.vahedi@gmail.com>
 <20200713154527.13430-2-shahab.vahedi@gmail.com>
 <422b0905-456f-8722-b74d-97ade4a95c58@simark.ca>
 <20200715203500.GA4811@gmail.com>
From: Simon Marchi <simark@simark.ca>
Message-ID: <c9b86f8c-627f-e0af-75e5-bc9bad5e8ad1@simark.ca>
Date: Thu, 16 Jul 2020 09:28:50 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.10.0
MIME-Version: 1.0
In-Reply-To: <20200715203500.GA4811@gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Language: fr
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-3.8 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <http://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <http://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Jul 2020 13:28:55 -0000

On 2020-07-15 4:35 p.m., Shahab Vahedi wrote:
>>> +/* Common construction code for ARC_GDBARCH_FEATURES struct.  If there
>>> +   is no ABFD, then a FEATURE with default values is returned.  */
>>> +void arc_gdbarch_features_init (arc_gdbarch_features &features,
>>> +				const bfd *abfd, const unsigned long mach)
>>
>> Since this function is only used in this file, can it be static?
> 
> The other patch [1] in this series makes use of this function.
> 
> [1] [PATCH v3 3/3] arc: Add GNU/Linux support for ARC
> https://sourceware.org/pipermail/gdb-patches/2020-July/170429.html

When I search for `arc_gdbarch_features_init` in that patch I don't find anything...

>> Could this function return a arc_gdbarch_features by value instead?  Then,
>> you could add a constructor to arc_gdbarch_features.  You wouldn't need
>> the "A 0 indicates an uninitialised value" comment on reg_size, as the
>> object could never be in an uninitialised state.
> 
> Those were my thoughts as well. I tried something like that, but I failed
> in the end:
> 
> Since the "arc_gdbarch_features" is introduced in "gdb/arch/arc.h", I
> tried implementing the constructor in "gdb/arch/arc.c". Because the
> constructor had to make use of "struct bfd", "bfd_get_flavour ()",
> "elf_header ()", "EFCLASS*", etc. the following headers had to be used in
> "gdb/arch/arc.c":
> 
>   #include "defs.h"
>   #include "arch-utils.h"
>   #include "elf-bfd.h"
> 
> This would build OK for gdb targets, but when compiling "arc.o" for
> gdbserver target, I ended up with something like:
> 
>   (paraphrasing)
>   Error: defs.h should not be included while building gdbserver.
>   (I am not sure if the error was indeed about "defs.h")
> 
> And removing/if-def-guarding that header file was opening other cans of
> worms. Eventually, I gave up and went for the current approach.

The constructor of arc_gdbarch_features shouldn't use a `bfd *`, since the code
in arch/ is used by gdbserver too, which doesn't use BFD.  I meant the constructor
should be just:

  arc_gdbarch_features (int reg_size, enum arc_isa isa)
    : reg_size (reg_size), isa (isa)
  {}

The code in GDB constructing an arc_gdbarch_features will use the BFD to determine
these two values, whereas the code in GDBserver will use something else (usually
looking at the current process' properties).

In fact, the constructor is optional, you could just build a arc_gdbarch_features
using aggregate initialization and return it from that function:

  arc_gdbarch_features features {reg_size, isa};

It doesn't really matter.  I just happen to prefer the constructor method, because
that makes it so you can't "forget" a field and it ensures it can never be in an
uninitialized state.

> No, I haven't built it with AddressSanitizer. Could you point me in the right
> direction how I can use it?

You already got the right information from Christian.

>>> +  /* Used by std::unordered_map to hash the feature sets.  The hash is
>>> +     calculated in the manner below:
>>> +     REG_SIZE |  ISA
>>> +      5-bits  | 4-bits  */
>>> +
>>> +  std::size_t hash () const noexcept
>>> +  {
>>> +    std::size_t val = ((reg_size & 0x1f) << 8 | (isa & 0xf) << 0);
>>> +    return val;
>>
>> I don't know much about hashing, but I don't think makes for a very good hash
>> function.  It outputs similar hashes for similar inputs, and does not use the
>> range of possible hash values uniformly.
> 
> Indeed it does not output hashes uniformly but I believe it is unique. The
> meaningful (and future proof) values for "reg_size" are {4, 8, 16} _bytes_.
> Hence, the "0x1f" mask should suffice. Having 15 versions for ISA should be
> enough as well. Having these conditions in mind, I don't think there is any
> collusion while hashing such inputs.

Well, I guess that depends on how the hash map implementation uses the hash value
to map into buckets.  Let's say that there are 16 buckets, and the hash table
implements decides which bucket a value goes in by looking at the last four bits.
And let's imagine that you insert 32 values, all with the same ISA value but with
32 different reg_sizes values (assuming that would make sense).  Then even though
the hash values are all different, they all end up in the same bucket.

Having a hash function that uniformly distributes the hash values makes that kind
of pathological case much more unlikely.

Anyway, I won't press more on that issue, since it's really not that important.
We're talking about just a few items, so it will never make a difference.  And as
Luis said on IRC, it could also use a vector it a linear search.

>> It would be more appropriate and typical to use std::hash on each member and
>> then combine the hashes.  Doing a quick search I found that Boost offers
>> something exactly for that called hash_combine:
>>
>> https://www.boost.org/doc/libs/1_55_0/doc/html/hash/combine.html
> 
> Thanks for this. I didn't know about such functionality. As you said
> though, we can't use it. Personally, I think it would be an overkill
> here. We can get away with a simple "shift" and "or". Please read my
> response above for my reasoning.
>>
>> Unfortunately, there is nothing similar in the standard library.  But
>>
>> A bit off-topic: intuitively, I would have thought that it would have been
>> sufficient to just add the hash values together.  Anyone knows why this
>> wouldn't be a good idea?
> 
> So far, possible values are:
> 
>   reg_size : {4, 8, 16}
>   ISA: {1, 2, ... 15}
> 
> A simple addition would produce the same result for these legally valid
> inputs:
> 
> reg_size=8, ISA=2   --->  hash = 10
> reg_size=4, ISA=6   --->  hash = 10

Sorry, I meant `std::hash (reg_size) + std::hash (isa)`, not `reg_size + isa`.

If `std::hash (reg_size)` produces a uniformly-distributed hash value, and
`std::hash (isa)` as well, then the output is surely uniformly-distributed,
no?

Simon