From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 11A153858D32 for ; Mon, 29 Aug 2022 16:18:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 11A153858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.cz Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 4AB6121CAF; Mon, 29 Aug 2022 16:18:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1661789927; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LNW9EEGjkUq747VrpR+We/k/qHCiQp24Fu2JgCKyYPc=; b=x0jQDzihzLYuQehOt3n/QerdWya405PuMO6+8nyhQbtB1JbqqZ2Xu+fwFHwXUq9rC8EUNp GEX43cH4p33cuZjgLw8JikSocuvRJ/pM8eEq4ig4EQYHR+/cPbaxYhFjv4XFsHH0y6wKVL t+H92Ji/Nh7lwff8RgHfgd5GnUWUExg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1661789927; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LNW9EEGjkUq747VrpR+We/k/qHCiQp24Fu2JgCKyYPc=; b=+ctlnkjeW4flE4luZzMlsuUBhhmJ1suMmFz1kZXEtGUSQ1ykn+F1YhCtWvWS2Z1B2PnGY4 2oGo1DCC2hdwehBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3B41B1352A; Mon, 29 Aug 2022 16:18:47 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id LN1yDufmDGM4HAAAMHmgww (envelope-from ); Mon, 29 Aug 2022 16:18:47 +0000 From: Martin Jambor To: Richard Biener Cc: Richard Sandiford , Richard Biener , GCC Patches Subject: Re: [PATCH 2/2] vec: Add array_slice::bsearch In-Reply-To: References: User-Agent: Notmuch/0.35 (https://notmuchmail.org) Emacs/28.1 (x86_64-suse-linux-gnu) Date: Mon, 29 Aug 2022 18:18:46 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_ASCII_DIVIDERS,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello, On Sat, Aug 27 2022, Richard Biener wrote: >> Am 26.08.2022 um 23:45 schrieb Martin Jambor : >> >> =EF=BB=BFHi, >> >>> On Fri, Aug 26 2022, Richard Sandiford wrote: >>> Richard Biener writes: >>>>> Am 26.08.2022 um 18:40 schrieb Martin Jambor : >>>>> >>>>> =EF=BB=BFHi, >>>>> >>>>> This adds a method to binary search in a sorted array_slice. >>>>> >>>>> The implementation is direct copy of vec:bsearch. Moreover, to only >>>>> copy it once and not twice, I used const_cast in the non-const >>>>> variants to be able to use the const variants. I hope that is >>>>> acceptable abuse of const_cast but I'll be happy to change that if >>>>> not. >>>>> >>>>> Bootstrapped and tested along code that actually uses it on >>>>> x86_64-linux. OK for trunk? >>>> >>>> Can you avoid the copying by using array slice bsearch from the vec<> = bsearch? >>> >>> IMO it would be better to transition to using routines >>> for this kind of thing (for new code). In this case that would be >>> std::lower_bound. >>> >>> Using std::lower_bound is more convenient because it avoids the void * >>> thing (you can use lambdas to capture any number of variables instead) >>> and because it works on subranges, not just whole ranges. >>> >> >> OK, I can use std::lower_bound with simple lambdas too. The semantics >> of returning the first matching a criterion actually allows me to use it >> one more time. > > Can you try to compare generated code? I have had a look and the std::lower_bound is slightly less efficient, we get 40 instructions or so instead of 28 or so, but it is mostly because it lacks the early exit bsearch has when its comparator returns zero and because of the final checks whether the lower bound is what we were searching for. But the templates and lambdas themselves do not seem to lead to any egregious overhead. As I wrote earlier, in one of the three searches I want to do I actually want to look for a lower bound (in the example below the first with the given index) and so I'd be willing to accept the trade-off. Full story: The data structure being searched is essentially an array of these structures, sorted primarily by index and those with the same index by unit_offset: struct GTY(()) ipa_argagg_value { tree value; unsigned unit_offset; unsigned index : 16; unsigned by_ref : 1; }; The C-way implementation: ---------------------------------------------------------------------- /* Callback for bsearch and qsort to sort aggregate values. */ static int compare_agg_values (const void *a, const void *b) { const ipa_argagg_value *agg1 =3D (const ipa_argagg_value *) a; const ipa_argagg_value *agg2 =3D (const ipa_argagg_value *) b; if (agg1->index < agg2->index) return -1; if (agg1->index > agg2->index) return 1; if (agg1->unit_offset < agg2->unit_offset) return -1; if (agg1->unit_offset > agg2->unit_offset) return 1; return 0; } const ipa_argagg_value * __attribute__((noinline)) testfun (const array_slice &elts, int index, unsigned unit_offset) { ipa_argagg_value key; key.index =3D index; key.unit_offset =3D unit_offset; return elts.bsearch (&key, compare_agg_values); } ---------------------------------------------------------------------- The C++-ish one: ---------------------------------------------------------------------- const ipa_argagg_value * __attribute__((noinline)) testfun (const array_slice &elts, int index, unsigned unit_offset) { ipa_argagg_value key; key.index =3D index; key.unit_offset =3D unit_offset; const ipa_argagg_value *res =3D std::lower_bound (elts.begin (), elts.end (), key, [] (const ipa_argagg_value &elt, const ipa_argagg_value &val) { if (elt.index < val.index) return true; if (elt.index > val.index) return false; if (elt.unit_offset < val.unit_offset) return true; return false; }); if (res =3D=3D elts.end () || res->index !=3D index || res->unit_offset !=3D unit_offset) res =3D nullptr; return res; } ---------------------------------------------------------------------- Using GCC 12.1, the use of bsearch yields the following optimized dump: ---------------------------------------------------------------------- ;; Function testfun (_Z7testfunRK11array_sliceIK16ipa_argagg_valueEij, fu= ncdef_no=3D3449, decl_uid=3D129622, cgraph_uid=3D2433, symbol_order=3D2603) __attribute__((noinline)) const struct ipa_argagg_value * testfun (const struct array_slice & elts,= int index, unsigned int unit_offset) { const void * p; size_t idx; size_t u; size_t l; short unsigned int _1; const struct ipa_argagg_value * _11; unsigned int _12; long unsigned int _15; long unsigned int _18; long unsigned int _20; const struct ipa_argagg_value * _24; short unsigned int _29; unsigned int _33; [local count: 1073741824]: _1 =3D (short unsigned int) index_2(D); _11 =3D MEM[(const struct ipa_argagg_value * *)elts_7(D)]; _12 =3D MEM[(unsigned int *)elts_7(D) + 8B]; _15 =3D (long unsigned int) _12; goto ; [100.00%] [local count: 11844779787]: # u_30 =3D PHI _18 =3D u_30 + l_34; idx_19 =3D _18 >> 1; _20 =3D idx_19 * 16; p_21 =3D _11 + _20; _29 =3D MEM[(const struct ipa_argagg_value *)p_21].index; if (_1 < _29) goto ; [34.00%] else goto ; [66.00%] [local count: 7817554615]: if (_1 > _29) goto ; [34.00%] else goto ; [66.00%] [local count: 5159586016]: _33 =3D MEM[(const struct ipa_argagg_value *)p_21].unit_offset; if (unit_offset_5(D) < _33) goto ; [34.00%] else goto ; [66.00%] [local count: 3405326750]: if (unit_offset_5(D) > _33) goto ; [94.50%] else goto ; [5.50%] [local count: 6604057032]: l_23 =3D idx_19 + 1; [local count: 7677798856]: # l_34 =3D PHI <0(2), l_23(7)> # u_27 =3D PHI <_15(2), u_30(7)> if (u_27 > l_34) goto ; [94.50%] else goto ; [5.50%] [local count: 5781484434]: if (idx_19 > l_34) goto ; [94.50%] else goto ; [5.50%] [local count: 1073741824]: # _24 =3D PHI return _24; } ---------------------------------------------------------------------- And the following assembly. ---------------------------------------------------------------------- .p2align 4 .globl _Z7testfunRK11array_sliceIK16ipa_argagg_valueEij .type _Z7testfunRK11array_sliceIK16ipa_argagg_valueEij, @function _Z7testfunRK11array_sliceIK16ipa_argagg_valueEij: .LFB3449: .cfi_startproc movq (%rdi), %r10 movl 8(%rdi), %r9d xorl %edi, %edi jmp .L1392 .p2align 4,,10 .p2align 3 .L1400: cmpw %si, %r8w jb .L1394 movl 8(%rax), %r8d cmpl %r8d, %edx jb .L1393 cmpl %edx, %r8d jnb .L1391 .L1394: leaq 1(%rcx), %rdi .L1392: cmpq %r9, %rdi jnb .L1399 .L1396: leaq (%r9,%rdi), %rcx shrq %rcx movq %rcx, %rax salq $4, %rax addq %r10, %rax movzwl 12(%rax), %r8d cmpw %r8w, %si jnb .L1400 .L1393: cmpq %rcx, %rdi jnb .L1399 movq %rcx, %r9 jmp .L1396 .p2align 4,,10 .p2align 3 .L1399: xorl %eax, %eax .L1391: ret .cfi_endproc .LFE3449: .size _Z7testfunRK11array_sliceIK16ipa_argagg_valueEij, .-_Z7testfunRK11= array_sliceIK16ipa_argagg_valueEij ---------------------------------------------------------------------- The C++ std::lower_bound leads to the following optimized dump: ---------------------------------------------------------------------- ;; Function testfun (_Z7testfunRK11array_sliceIK16ipa_argagg_valueEij, fu= ncdef_no=3D4007, decl_uid=3D138795, cgraph_uid=3D2481, symbol_order=3D2656) __attribute__((noinline)) const struct ipa_argagg_value * testfun (const struct array_slice & elts,= int index, unsigned int unit_offset) { _DistanceType __half; _DistanceType __len; const struct ipa_argagg_value * __first; const struct ipa_argagg_value * res; short unsigned int _1; short unsigned int _2; int _3; unsigned int _4; const struct ipa_argagg_value * _14; unsigned int _15; long unsigned int _16; long unsigned int _17; const struct ipa_argagg_value * _18; long int _20; long unsigned int __n.58_39; signed long _53; long unsigned int _56; const struct ipa_argagg_value * _57; short unsigned int _58; long int _60; unsigned int _62; [local count: 1073741823]: _1 =3D (short unsigned int) index_6(D); _14 =3D elts_11(D)->m_base; _15 =3D elts_11(D)->m_size; _16 =3D (long unsigned int) _15; _17 =3D _16 * 16; _18 =3D _14 + _17; _53 =3D (signed long) _17; _20 =3D _53 /[ex] 16; if (_53 !=3D 0) goto ; [89.00%] else goto ; [11.00%] [local count: 8687547686]: # __len_32 =3D PHI <_20(2), __len_64(7)> # __first_35 =3D PHI <_14(2), __first_63(7)> __half_38 =3D __len_32 >> 1; __n.58_39 =3D (long unsigned int) __half_38; _56 =3D __n.58_39 * 16; _57 =3D __first_35 + _56; _58 =3D _57->index; if (_1 > _58) goto ; [34.00%] else goto ; [66.00%] [local count: 5733781450]: if (_1 < _58) goto ; [34.00%] else goto ; [66.00%] [local count: 3784295727]: _62 =3D _57->unit_offset; if (unit_offset_9(D) > _62) goto ; [34.00%] else goto ; [66.00%] [local count: 4240426809]: __first_59 =3D _57 + 16; _60 =3D __len_32 - __half_38; __len_61 =3D _60 + -1; [local count: 8687547695]: # __first_63 =3D PHI <__first_59(6), __first_35(5), __first_35(4)> # __len_64 =3D PHI <__len_61(6), __half_38(5), __half_38(4)> if (__len_64 > 0) goto ; [89.00%] else goto ; [11.00%] [local count: 1073741841]: # __first_51 =3D PHI <__first_63(7), _14(2)> if (_18 =3D=3D __first_51) goto ; [14.90%] else goto ; [85.10%] [local count: 913754310]: _2 =3D __first_51->index; _3 =3D (int) _2; if (_3 !=3D index_6(D)) goto ; [44.22%] else goto ; [55.78%] [local count: 509692156]: _4 =3D __first_51->unit_offset; if (_4 !=3D unit_offset_9(D)) goto ; [44.22%] else goto ; [55.78%] [local count: 225385870]: [local count: 1073741841]: # res_5 =3D PHI <__first_51(10), 0B(8), 0B(9), 0B(11)> return res_5; } ---------------------------------------------------------------------- And the following assembly: ---------------------------------------------------------------------- .p2align 4 .globl _Z7testfunRK11array_sliceIK16ipa_argagg_valueEij .type _Z7testfunRK11array_sliceIK16ipa_argagg_valueEij, @function _Z7testfunRK11array_sliceIK16ipa_argagg_valueEij: .LFB4007: .cfi_startproc movl 8(%rdi), %eax movq (%rdi), %r8 movl %esi, %r10d movl %esi, %r9d salq $4, %rax movq %rax, %rcx leaq (%r8,%rax), %r11 sarq $4, %rcx testq %rax, %rax jne .L1395 jmp .L1392 .p2align 4,,10 .p2align 3 .L1405: cmpw %di, %r9w jb .L1398 cmpl %edx, 8(%rax) jb .L1393 .L1398: movq %rsi, %rcx testq %rcx, %rcx jle .L1392 .L1395: movq %rcx, %rsi sarq %rsi movq %rsi, %rax salq $4, %rax addq %r8, %rax movzwl 12(%rax), %edi cmpw %r9w, %di jnb .L1405 .L1393: subq %rsi, %rcx leaq 16(%rax), %r8 subq $1, %rcx testq %rcx, %rcx jg .L1395 .L1392: cmpq %r8, %r11 je .L1400 movzwl 12(%r8), %eax cmpl %r10d, %eax jne .L1400 cmpl %edx, 8(%r8) je .L1391 .L1400: xorl %r8d, %r8d .L1391: movq %r8, %rax ret .cfi_endproc .LFE4007: .size _Z7testfunRK11array_sliceIK16ipa_argagg_valueEij, .-_Z7testfunRK11= array_sliceIK16ipa_argagg_valueEij ---------------------------------------------------------------------- Martin