From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <mjambor@suse.cz>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
	by sourceware.org (Postfix) with ESMTPS id 11A153858D32
	for <gcc-patches@gcc.gnu.org>; Mon, 29 Aug 2022 16:18:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 11A153858D32
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.cz
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.cz
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
	(No client certificate requested)
	by smtp-out1.suse.de (Postfix) with ESMTPS id 4AB6121CAF;
	Mon, 29 Aug 2022 16:18:47 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa;
	t=1661789927; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=LNW9EEGjkUq747VrpR+We/k/qHCiQp24Fu2JgCKyYPc=;
	b=x0jQDzihzLYuQehOt3n/QerdWya405PuMO6+8nyhQbtB1JbqqZ2Xu+fwFHwXUq9rC8EUNp
	GEX43cH4p33cuZjgLw8JikSocuvRJ/pM8eEq4ig4EQYHR+/cPbaxYhFjv4XFsHH0y6wKVL
	t+H92Ji/Nh7lwff8RgHfgd5GnUWUExg=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz;
	s=susede2_ed25519; t=1661789927;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=LNW9EEGjkUq747VrpR+We/k/qHCiQp24Fu2JgCKyYPc=;
	b=+ctlnkjeW4flE4luZzMlsuUBhhmJ1suMmFz1kZXEtGUSQ1ykn+F1YhCtWvWS2Z1B2PnGY4
	2oGo1DCC2hdwehBw==
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
	(No client certificate requested)
	by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3B41B1352A;
	Mon, 29 Aug 2022 16:18:47 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
	by imap2.suse-dmz.suse.de with ESMTPSA
	id LN1yDufmDGM4HAAAMHmgww
	(envelope-from <mjambor@suse.cz>); Mon, 29 Aug 2022 16:18:47 +0000
From: Martin Jambor <mjambor@suse.cz>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Richard Sandiford <richard.sandiford@arm.com>, Richard Biener
 <rguenther@suse.de>, GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 2/2] vec: Add array_slice::bsearch
In-Reply-To: <B1254077-F5C5-4E72-B6CA-1948905A3857@gmail.com>
References: <ri6a67q7k7n.fsf@suse.cz>
 <B1254077-F5C5-4E72-B6CA-1948905A3857@gmail.com>
User-Agent: Notmuch/0.35 (https://notmuchmail.org) Emacs/28.1
 (x86_64-suse-linux-gnu)
Date: Mon, 29 Aug 2022 18:18:46 +0200
Message-ID: <ri6sflff2ft.fsf@suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_ASCII_DIVIDERS,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hello,

On Sat, Aug 27 2022, Richard Biener wrote:
>> Am 26.08.2022 um 23:45 schrieb Martin Jambor <mjambor@suse.cz>:
>>
>> =EF=BB=BFHi,
>>
>>> On Fri, Aug 26 2022, Richard Sandiford wrote:
>>> Richard Biener <rguenther@suse.de> writes:
>>>>> Am 26.08.2022 um 18:40 schrieb Martin Jambor <mjambor@suse.cz>:
>>>>>
>>>>> =EF=BB=BFHi,
>>>>>
>>>>> This adds a method to binary search in a sorted array_slice.
>>>>>
>>>>> The implementation is direct copy of vec:bsearch.  Moreover, to only
>>>>> copy it once and not twice, I used const_cast in the non-const
>>>>> variants to be able to use the const variants.  I hope that is
>>>>> acceptable abuse of const_cast but I'll be happy to change that if
>>>>> not.
>>>>>
>>>>> Bootstrapped and tested along code that actually uses it on
>>>>> x86_64-linux.  OK for trunk?
>>>>
>>>> Can you avoid the copying by using array slice bsearch from the vec<> =
bsearch?
>>>
>>> IMO it would be better to transition to using <algorithm> routines
>>> for this kind of thing (for new code).  In this case that would be
>>> std::lower_bound.
>>>
>>> Using std::lower_bound is more convenient because it avoids the void *
>>> thing (you can use lambdas to capture any number of variables instead)
>>> and because it works on subranges, not just whole ranges.
>>>
>>
>> OK, I can use std::lower_bound with simple lambdas too.  The semantics
>> of returning the first matching a criterion actually allows me to use it
>> one more time.
>
> Can you try to compare generated code?

I have had a look and the std::lower_bound is slightly less efficient,
we get 40 instructions or so instead of 28 or so, but it is mostly
because it lacks the early exit bsearch has when its comparator returns
zero and because of the final checks whether the lower bound is what we
were searching for.  But the templates and lambdas themselves do not
seem to lead to any egregious overhead.

As I wrote earlier, in one of the three searches I want to do I actually
want to look for a lower bound (in the example below the first with the
given index) and so I'd be willing to accept the trade-off.

Full story:

The data structure being searched is essentially an array of these
structures, sorted primarily by index and those with the same index by
unit_offset:

  struct GTY(()) ipa_argagg_value
  {
    tree value;
    unsigned unit_offset;
    unsigned index : 16;
    unsigned by_ref : 1;
  };


The C-way implementation:

  ----------------------------------------------------------------------
  /* Callback for bsearch and qsort to sort aggregate values.  */

  static int
  compare_agg_values (const void *a, const void *b)
  {
    const ipa_argagg_value *agg1 =3D (const ipa_argagg_value *) a;
    const ipa_argagg_value *agg2 =3D (const ipa_argagg_value *) b;

    if (agg1->index < agg2->index)
      return -1;
    if (agg1->index > agg2->index)
      return 1;

    if (agg1->unit_offset < agg2->unit_offset)
      return -1;
    if (agg1->unit_offset > agg2->unit_offset)
      return 1;
    return 0;
  }

  const ipa_argagg_value * __attribute__((noinline))
  testfun (const array_slice<const ipa_argagg_value> &elts,
  	  int index, unsigned unit_offset)
  {
    ipa_argagg_value key;
    key.index =3D index;
    key.unit_offset =3D unit_offset;
    return elts.bsearch (&key, compare_agg_values);
  }
  ----------------------------------------------------------------------

The C++-ish one:

  ----------------------------------------------------------------------
  const ipa_argagg_value * __attribute__((noinline))
  testfun (const array_slice<const ipa_argagg_value> &elts,
  	  int index, unsigned unit_offset)
  {
    ipa_argagg_value key;
    key.index =3D index;
    key.unit_offset =3D unit_offset;
    const ipa_argagg_value *res
      =3D std::lower_bound (elts.begin (), elts.end (), key,
  			[] (const ipa_argagg_value &elt,
  			    const ipa_argagg_value &val)
  			{
  			  if (elt.index < val.index)
  			    return true;
  			  if (elt.index > val.index)
  			    return false;
  			  if (elt.unit_offset < val.unit_offset)
  			    return true;
  			  return false;
  			});

    if (res =3D=3D elts.end ()
        || res->index !=3D index
        || res->unit_offset !=3D unit_offset)
      res =3D nullptr;
    return res;
  }
  ----------------------------------------------------------------------

Using GCC 12.1, the use of bsearch yields the following optimized dump:

  ----------------------------------------------------------------------
  ;; Function testfun (_Z7testfunRK11array_sliceIK16ipa_argagg_valueEij, fu=
ncdef_no=3D3449, decl_uid=3D129622, cgraph_uid=3D2433, symbol_order=3D2603)

  __attribute__((noinline))
  const struct ipa_argagg_value * testfun (const struct array_slice & elts,=
 int index, unsigned int unit_offset)
  {
    const void * p;
    size_t idx;
    size_t u;
    size_t l;
    short unsigned int _1;
    const struct ipa_argagg_value * _11;
    unsigned int _12;
    long unsigned int _15;
    long unsigned int _18;
    long unsigned int _20;
    const struct ipa_argagg_value * _24;
    short unsigned int _29;
    unsigned int _33;

    <bb 2> [local count: 1073741824]:
    _1 =3D (short unsigned int) index_2(D);
    _11 =3D MEM[(const struct ipa_argagg_value * *)elts_7(D)];
    _12 =3D MEM[(unsigned int *)elts_7(D) + 8B];
    _15 =3D (long unsigned int) _12;
    goto <bb 8>; [100.00%]

    <bb 3> [local count: 11844779787]:
    # u_30 =3D PHI <idx_19(9), u_27(8)>
    _18 =3D u_30 + l_34;
    idx_19 =3D _18 >> 1;
    _20 =3D idx_19 * 16;
    p_21 =3D _11 + _20;
    _29 =3D MEM[(const struct ipa_argagg_value *)p_21].index;
    if (_1 < _29)
      goto <bb 9>; [34.00%]
    else
      goto <bb 4>; [66.00%]

    <bb 4> [local count: 7817554615]:
    if (_1 > _29)
      goto <bb 7>; [34.00%]
    else
      goto <bb 5>; [66.00%]

    <bb 5> [local count: 5159586016]:
    _33 =3D MEM[(const struct ipa_argagg_value *)p_21].unit_offset;
    if (unit_offset_5(D) < _33)
      goto <bb 9>; [34.00%]
    else
      goto <bb 6>; [66.00%]

    <bb 6> [local count: 3405326750]:
    if (unit_offset_5(D) > _33)
      goto <bb 7>; [94.50%]
    else
      goto <bb 10>; [5.50%]

    <bb 7> [local count: 6604057032]:
    l_23 =3D idx_19 + 1;

    <bb 8> [local count: 7677798856]:
    # l_34 =3D PHI <0(2), l_23(7)>
    # u_27 =3D PHI <_15(2), u_30(7)>
    if (u_27 > l_34)
      goto <bb 3>; [94.50%]
    else
      goto <bb 10>; [5.50%]

    <bb 9> [local count: 5781484434]:
    if (idx_19 > l_34)
      goto <bb 3>; [94.50%]
    else
      goto <bb 10>; [5.50%]

    <bb 10> [local count: 1073741824]:
    # _24 =3D PHI <p_21(6), 0B(9), 0B(8)>
    return _24;

  }
  ----------------------------------------------------------------------

And the following assembly.

  ----------------------------------------------------------------------
  	.p2align 4
  	.globl	_Z7testfunRK11array_sliceIK16ipa_argagg_valueEij
  	.type	_Z7testfunRK11array_sliceIK16ipa_argagg_valueEij, @function
  _Z7testfunRK11array_sliceIK16ipa_argagg_valueEij:
  .LFB3449:
  	.cfi_startproc
  	movq	(%rdi), %r10
  	movl	8(%rdi), %r9d
  	xorl	%edi, %edi
  	jmp	.L1392
  	.p2align 4,,10
  	.p2align 3
  .L1400:
  	cmpw	%si, %r8w
  	jb	.L1394
  	movl	8(%rax), %r8d
  	cmpl	%r8d, %edx
  	jb	.L1393
  	cmpl	%edx, %r8d
  	jnb	.L1391
  .L1394:
  	leaq	1(%rcx), %rdi
  .L1392:
  	cmpq	%r9, %rdi
  	jnb	.L1399
  .L1396:
  	leaq	(%r9,%rdi), %rcx
  	shrq	%rcx
  	movq	%rcx, %rax
  	salq	$4, %rax
  	addq	%r10, %rax
  	movzwl	12(%rax), %r8d
  	cmpw	%r8w, %si
  	jnb	.L1400
  .L1393:
  	cmpq	%rcx, %rdi
  	jnb	.L1399
  	movq	%rcx, %r9
  	jmp	.L1396
  	.p2align 4,,10
  	.p2align 3
  .L1399:
  	xorl	%eax, %eax
  .L1391:
  	ret
  	.cfi_endproc
  .LFE3449:
  	.size	_Z7testfunRK11array_sliceIK16ipa_argagg_valueEij, .-_Z7testfunRK11=
array_sliceIK16ipa_argagg_valueEij
  ----------------------------------------------------------------------

The C++ <algorithm> std::lower_bound leads to the following optimized
dump:

  ----------------------------------------------------------------------
  ;; Function testfun (_Z7testfunRK11array_sliceIK16ipa_argagg_valueEij, fu=
ncdef_no=3D4007, decl_uid=3D138795, cgraph_uid=3D2481, symbol_order=3D2656)

  __attribute__((noinline))
  const struct ipa_argagg_value * testfun (const struct array_slice & elts,=
 int index, unsigned int unit_offset)
  {
    _DistanceType __half;
    _DistanceType __len;
    const struct ipa_argagg_value * __first;
    const struct ipa_argagg_value * res;
    short unsigned int _1;
    short unsigned int _2;
    int _3;
    unsigned int _4;
    const struct ipa_argagg_value * _14;
    unsigned int _15;
    long unsigned int _16;
    long unsigned int _17;
    const struct ipa_argagg_value * _18;
    long int _20;
    long unsigned int __n.58_39;
    signed long _53;
    long unsigned int _56;
    const struct ipa_argagg_value * _57;
    short unsigned int _58;
    long int _60;
    unsigned int _62;

    <bb 2> [local count: 1073741823]:
    _1 =3D (short unsigned int) index_6(D);
    _14 =3D elts_11(D)->m_base;
    _15 =3D elts_11(D)->m_size;
    _16 =3D (long unsigned int) _15;
    _17 =3D _16 * 16;
    _18 =3D _14 + _17;
    _53 =3D (signed long) _17;
    _20 =3D _53 /[ex] 16;
    if (_53 !=3D 0)
      goto <bb 3>; [89.00%]
    else
      goto <bb 8>; [11.00%]

    <bb 3> [local count: 8687547686]:
    # __len_32 =3D PHI <_20(2), __len_64(7)>
    # __first_35 =3D PHI <_14(2), __first_63(7)>
    __half_38 =3D __len_32 >> 1;
    __n.58_39 =3D (long unsigned int) __half_38;
    _56 =3D __n.58_39 * 16;
    _57 =3D __first_35 + _56;
    _58 =3D _57->index;
    if (_1 > _58)
      goto <bb 6>; [34.00%]
    else
      goto <bb 4>; [66.00%]

    <bb 4> [local count: 5733781450]:
    if (_1 < _58)
      goto <bb 7>; [34.00%]
    else
      goto <bb 5>; [66.00%]

    <bb 5> [local count: 3784295727]:
    _62 =3D _57->unit_offset;
    if (unit_offset_9(D) > _62)
      goto <bb 6>; [34.00%]
    else
      goto <bb 7>; [66.00%]

    <bb 6> [local count: 4240426809]:
    __first_59 =3D _57 + 16;
    _60 =3D __len_32 - __half_38;
    __len_61 =3D _60 + -1;

    <bb 7> [local count: 8687547695]:
    # __first_63 =3D PHI <__first_59(6), __first_35(5), __first_35(4)>
    # __len_64 =3D PHI <__len_61(6), __half_38(5), __half_38(4)>
    if (__len_64 > 0)
      goto <bb 3>; [89.00%]
    else
      goto <bb 8>; [11.00%]

    <bb 8> [local count: 1073741841]:
    # __first_51 =3D PHI <__first_63(7), _14(2)>
    if (_18 =3D=3D __first_51)
      goto <bb 12>; [14.90%]
    else
      goto <bb 9>; [85.10%]

    <bb 9> [local count: 913754310]:
    _2 =3D __first_51->index;
    _3 =3D (int) _2;
    if (_3 !=3D index_6(D))
      goto <bb 12>; [44.22%]
    else
      goto <bb 10>; [55.78%]

    <bb 10> [local count: 509692156]:
    _4 =3D __first_51->unit_offset;
    if (_4 !=3D unit_offset_9(D))
      goto <bb 11>; [44.22%]
    else
      goto <bb 12>; [55.78%]

    <bb 11> [local count: 225385870]:

    <bb 12> [local count: 1073741841]:
    # res_5 =3D PHI <__first_51(10), 0B(8), 0B(9), 0B(11)>
    return res_5;

  }
  ----------------------------------------------------------------------

And the following assembly:

  ----------------------------------------------------------------------

  	.p2align 4
  	.globl	_Z7testfunRK11array_sliceIK16ipa_argagg_valueEij
  	.type	_Z7testfunRK11array_sliceIK16ipa_argagg_valueEij, @function
  _Z7testfunRK11array_sliceIK16ipa_argagg_valueEij:
  .LFB4007:
  	.cfi_startproc
  	movl	8(%rdi), %eax
  	movq	(%rdi), %r8
  	movl	%esi, %r10d
  	movl	%esi, %r9d
  	salq	$4, %rax
  	movq	%rax, %rcx
  	leaq	(%r8,%rax), %r11
  	sarq	$4, %rcx
  	testq	%rax, %rax
  	jne	.L1395
  	jmp	.L1392
  	.p2align 4,,10
  	.p2align 3
  .L1405:
  	cmpw	%di, %r9w
  	jb	.L1398
  	cmpl	%edx, 8(%rax)
  	jb	.L1393
  .L1398:
  	movq	%rsi, %rcx
  	testq	%rcx, %rcx
  	jle	.L1392
  .L1395:
  	movq	%rcx, %rsi
  	sarq	%rsi
  	movq	%rsi, %rax
  	salq	$4, %rax
  	addq	%r8, %rax
  	movzwl	12(%rax), %edi
  	cmpw	%r9w, %di
  	jnb	.L1405
  .L1393:
  	subq	%rsi, %rcx
  	leaq	16(%rax), %r8
  	subq	$1, %rcx
  	testq	%rcx, %rcx
  	jg	.L1395
  .L1392:
  	cmpq	%r8, %r11
  	je	.L1400
  	movzwl	12(%r8), %eax
  	cmpl	%r10d, %eax
  	jne	.L1400
  	cmpl	%edx, 8(%r8)
  	je	.L1391
  .L1400:
  	xorl	%r8d, %r8d
  .L1391:
  	movq	%r8, %rax
  	ret
  	.cfi_endproc
  .LFE4007:
  	.size	_Z7testfunRK11array_sliceIK16ipa_argagg_valueEij, .-_Z7testfunRK11=
array_sliceIK16ipa_argagg_valueEij
  ----------------------------------------------------------------------

Martin