From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1030.google.com (mail-pj1-x1030.google.com [IPv6:2607:f8b0:4864:20::1030]) by sourceware.org (Postfix) with ESMTPS id A0F0C38582AC for ; Tue, 18 Jul 2023 17:04:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A0F0C38582AC Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-pj1-x1030.google.com with SMTP id 98e67ed59e1d1-2657d405ad5so4095354a91.1 for ; Tue, 18 Jul 2023 10:04:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1689699855; x=1692291855; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6O50IdpQ8JBryjSn5alAe6LqrApnMNzN2puTbETAgXM=; b=qV8PKRxqNVkD+P1crT0rpkBMSpPpxU8kdHYjoBXGCJq3Rx+kZ4UAPrsE7IbvCiFI2L isZ/uCdywOLnOZ2vnjOY0Uv6eoCtjBpAKD933ABmpF4BNACcUD1owDbeYlzdICXR6NcR HAeLHS46P0ohllRG0WHCCgHQ0F5H3MCmNB2M6wh9jA34JOFpXkC3G1G7cVp0q5i7sHZ2 nBmTZu4w18T6kGBjZTbzsQOfTK3qeBfMT1tX55+SqLqIjA8Emt51vCx7KIcMPgZbhdF/ ow3UIS+kpQ34V0qosdSvMRhblfGx+vP0iDFEkCjoVS9ywxvpe/PGOjapVU3ave6qPBjg yyEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689699855; x=1692291855; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6O50IdpQ8JBryjSn5alAe6LqrApnMNzN2puTbETAgXM=; b=LX+xi1VhLlzxgb63zNYKUd643sPMN/7MqoIEXFaQFS9ikGTmEp9LDKqzbEPXwRean6 7spyLTiosXkcHOfsObYXO7v+QkDxAfWc75+ae4jBQMMiIuEew6V4n9Y+iPQkRTFnTNPT 5FWe2oqTgOMikT/1BCtod1gGq+o1UscANYRiDmzcGqrBhKsmy8RqPzNol2PvIenKSLOS 97c1JM9VbEEvmaAb2Xys2M+CB8dQUoO1FFdP8YX6yNkwDOcFkytJHth1GpueTNeeJL+0 qn5+i6zfJsw7YWUXcomrNM+LhdvVk2ozCkZJI+GSsrV9AudJDZ+60CRp/v8+uas49eIm ewuw== X-Gm-Message-State: ABy/qLbm3HKwL2lCnBlB6jbKoLwByC99S9Eptq9LR94D6qg/XJd5GBAl W5rFAsD9RAbbF+8Th8HFPV/03/9c6O7nFjnVtV3COQ== X-Google-Smtp-Source: APBJJlH19x0zUhGdQM0VTOywAJ9YN9vTFcNFxZD7Yk4D6Kf3yUhuuYrXxe/j7yD947gO97GZMrLUGzjENnD/140JCiI= X-Received: by 2002:a17:90b:4f86:b0:263:6c50:750a with SMTP id qe6-20020a17090b4f8600b002636c50750amr13382817pjb.3.1689699854920; Tue, 18 Jul 2023 10:04:14 -0700 (PDT) MIME-Version: 1.0 References: <20230713140904.3274306-1-manolis.tsamis@vrull.eu> In-Reply-To: From: Manolis Tsamis Date: Tue, 18 Jul 2023 20:03:38 +0300 Message-ID: Subject: Re: [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets To: Manolis Tsamis , gcc-patches@gcc.gnu.org, Philipp Tomsich , Jakub Jelinek , Andrew Pinski , Robin Dapp , richard.sandiford@arm.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,JMQ_SPF_NEUTRAL,KAM_MANYTO,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Richard, Thanks for your insightful reply. On Tue, Jul 18, 2023 at 1:12=E2=80=AFAM Richard Sandiford wrote: > > Manolis Tsamis writes: > > noce_convert_multiple_sets has been introduced and extended over time t= o handle > > if conversion for blocks with multiple sets. Currently this is focused = on > > register moves and rejects any sort of arithmetic operations. > > > > This series is an extension to allow more sequences to take part in if > > conversion. The first patch is a required change to emit correct code a= nd the > > second patch whitelists a larger number of operations through > > bb_ok_for_noce_convert_multiple_sets. > > > > For targets that have a rich selection of conditional instructions, > > like aarch64, I have seen an ~5x increase of profitable if conversions = for > > multiple set blocks in SPEC benchmarks. Also tested with a wide variety= of > > benchmarks and I have not seen performance regressions on either x64 / = aarch64. > > Interesting results. Are you free to say which target you used for aarch= 64? > > If I've understood the cost heuristics correctly, we'll allow a "predicta= ble" > branch to be replaced by up to 5 simple conditional instructions and an > "unpredictable" branch to be replaced by up to 10 simple conditional > instructions. That seems pretty high. And I'm not sure how well we > guess predictability in the absence of real profile information. > > So my gut instinct was that the limitations of the current code might > be saving us from overly generous limits. It sounds from your results > like that might not be the case though. > > Still, if it does turn out to be the case in future, I agree we should > fix the costs rather than hamstring the code. > My writing may have been confusing, but with "~5x increase of profitable if conversions" I just meant that ifcvt considers these profitable, not that they actually are when executed in particular hardware. But at the same time I haven't yet seen any obvious performance regressions in some benchmarks that I have ran. In any case it could be interesting to microbenchmark branches vs conditional instructions and see how sane these numbers are. > > Some samples that previously resulted in a branch but now better use th= ese > > instructions can be seen in the provided test case. > > > > Tested on aarch64 and x64; On x64 some tests that use __builtin_rint ar= e > > failing with an ICE but I believe that it's not an issue of this change= . > > force_operand crashes when (and:DF (not:DF (reg:DF 88)) (reg/v:DF 83 [ = x ])) > > is provided through emit_conditional_move. > > I guess that needs to be fixed first though. (Thanks for checking both > targets.) > I get a feeling this may be fixed if I properly take care of your points 1 & 2 below. I will report on that. > My main comments on the series are: > > (1) It isn't obvious which operations should be included in the list > in patch 2 and which shouldn't. Also, the patch only checks the > outermost operation, and so it allows the inner rtxes to be > arbitrarily complex. > > Because of that, it might be better to remove the condition > altogether and just rely on the other routines to do costing and > correctness checks. > That is true; I wanted to somehow only allow "normal arithmetic operations" and avoid generating sequences with stranger codes. I will try and see what happens if I remove the condition altogether. I also totally missed the fact that I was allowing arbitrarily complex inner rtxes so thanks for pointing that out. > (2) Don't you also need to update the "rewiring" mechanism, to cope > with cases where the then block has something like: > > if (a =3D=3D 0) { > a =3D b op c; -> a' =3D a =3D=3D 0 ? b op c : a; > d =3D a op b; -> d =3D a =3D=3D 0 ? a' op b : d; > } a =3D a' > > At the moment the code only handles regs and subregs, whereas but IIU= C > it should now iterate over all the regs in the SET_SRC. And I suppos= e > that creates the need for multiple possible rewirings in the same ins= n, > so that it isn't a simple insn -> index mapping any more. > Indeed, I believe this current patch cannot properly handle these. I will create testcases for this and see what changes need to be done in the next iteration so that correct code is generated. Thanks, Manolis > Thanks, > Richard > > > > > > > Changes in v2: > > - Change "conditional moves" to "conditional instructions" > > in bb_ok_for_noce_convert_multiple_sets's comment. > > > > Manolis Tsamis (2): > > ifcvt: handle sequences that clobber flags in > > noce_convert_multiple_sets > > ifcvt: Allow more operations in multiple set if conversion > > > > gcc/ifcvt.cc | 109 ++++++++++-------- > > .../aarch64/ifcvt_multiple_sets_arithm.c | 67 +++++++++++ > > 2 files changed, 127 insertions(+), 49 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_set= s_arithm.c