From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x234.google.com (mail-lj1-x234.google.com [IPv6:2a00:1450:4864:20::234]) by sourceware.org (Postfix) with ESMTPS id 2C4D83858D28 for ; Mon, 7 Aug 2023 07:49:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2C4D83858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lj1-x234.google.com with SMTP id 38308e7fff4ca-2b9cf2b1309so51330361fa.0 for ; Mon, 07 Aug 2023 00:49:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691394572; x=1691999372; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jwu1RAdJfA8HKRQldN73hsKAlT5lUnJh9XJ79iGO3/0=; b=P3wYZfowk4a4dF3l+szvRyXMzmpdO9+RL3aFSomCWxQ8GfM+y6qm8AUU5TI2weuV8A 0euneX3TuZKeRJGnY7oBtb5lfcISxkFX5nBiCDarCzVNWub44/qg04qJ19ErhIo1vpPe miodxSDptC/PvWNqP6wnRTO9gYNVyUCsMxilJzVYhXMhMZMre5tnY/LBtpvQyU3J4lU3 c82K8qpT3VkE8REE+qRYb+VpgVPNhvb2EF74l0xA1ZpJRfNAoib4mfWTsqr0vdabpqtD 5vRA7KlBpxFW8qpAnRYp5lSB+ocT4f3RWLMbgEcLUMDETwO4k+1xvWzwmtzWkqKUsJYC aBPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691394572; x=1691999372; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jwu1RAdJfA8HKRQldN73hsKAlT5lUnJh9XJ79iGO3/0=; b=QBzpOOdStFyz6IkbI5Uo7BPnBAMyarVcMFRjt4/7zmNlWIJCVd2h/jCPrm7zJuYnr4 SJ/N/xS/auyiO64DWiOUsN63pYxEHQJ1dbudAEG7A2aFpIVTw9Uuflrkuw6PItTMRr63 AEJwpIuL0g21m36mYvceObVsGBrmbbbe7N/Kc6ddHO4kSzI4iyKYikF8gvgFMC1RvzWm /IxWoGFXZodiSPLpTNu26Ag+B5tH6UJDhoRiuZ2DX/d4rXZTZM41S49vNCtdLO416yDM osfhOZAbbj6j+VenTH7WqlyP/NWlUCp7mG5NQOGtWxjpwFTaAX+EHEueQ90lr84HLJQY 90Vg== X-Gm-Message-State: AOJu0YyLjZwSvjpusMmpo6TkDDCZnuhJIvQbxsGfuU0x+w0eOxv+K826 Mjn1gJmZPOFUIN+0v9Iqmpq/CKEa6W/ShAU4E+M= X-Google-Smtp-Source: AGHT+IG8pFbzyBZEs46zlbvBPuwWv67n2ngzR57eTtiK0I2AGgmsFZEufs4JPPjh9VMmHnoTKlDuvHcAWzGUXs0a9DI= X-Received: by 2002:a05:651c:1255:b0:2b6:9e1d:cd0f with SMTP id h21-20020a05651c125500b002b69e1dcd0fmr2141458ljh.12.1691394572142; Mon, 07 Aug 2023 00:49:32 -0700 (PDT) MIME-Version: 1.0 References: <20230728070552.50C1413276@imap2.suse-dmz.suse.de> <320f94c3-8805-ec64-dd4b-0454c8ecce14@gmail.com> In-Reply-To: From: Richard Biener Date: Mon, 7 Aug 2023 09:48:28 +0200 Message-ID: Subject: Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization To: Prathamesh Kulkarni Cc: Richard Biener , Jeff Law , gcc-patches@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Aug 7, 2023 at 2:05=E2=80=AFAM Prathamesh Kulkarni via Gcc-patches wrote: > > On Thu, 3 Aug 2023 at 17:48, Richard Biener wrote: > > > > On Thu, 3 Aug 2023, Richard Biener wrote: > > > > > On Thu, 3 Aug 2023, Richard Biener wrote: > > > > > > > On Thu, 3 Aug 2023, Prathamesh Kulkarni wrote: > > > > > > > > > On Wed, 2 Aug 2023 at 14:17, Richard Biener via Gcc-patches > > > > > wrote: > > > > > > > > > > > > On Mon, 31 Jul 2023, Jeff Law wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > On 7/28/23 01:05, Richard Biener via Gcc-patches wrote: > > > > > > > > The following delays sinking of loads within the same inner= most > > > > > > > > loop when it was unconditional before. That's a not uncomm= on > > > > > > > > issue preventing vectorization when masked loads are not av= ailable. > > > > > > > > > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > > > > > > > > > > > > > I have a followup patch improving sinking that without this= would > > > > > > > > cause more of the problematic sinking - now that we have a = second > > > > > > > > sink pass after loop opts this looks like a reasonable appr= oach? > > > > > > > > > > > > > > > > OK? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Richard. > > > > > > > > > > > > > > > > PR tree-optimization/92335 > > > > > > > > * tree-ssa-sink.cc (select_best_block): Before loop > > > > > > > > optimizations avoid sinking unconditional loads/stores > > > > > > > > in innermost loops to conditional executed places. > > > > > > > > > > > > > > > > * gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing. > > > > > > > > * gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c, > > > > > > > > expect predictive commoning to happen instead of sinking. > > > > > > > > * gcc.dg/vect/pr65947-3.c: Adjust. > > > > > > > I think it's reasonable -- there's probably going to be cases= where it's not > > > > > > > great, but more often than not I think it's going to be a rea= sonable > > > > > > > heuristic. > > > > > > > > > > > > > > If there is undesirable fallout, better to find it over the c= oming months than > > > > > > > next spring. So I'd suggest we go forward now to give more t= ime to find any > > > > > > > pathological cases (if they exist). > > > > > > > > > > > > Agreed, I've pushed this now. > > > > > Hi Richard, > > > > > After this patch (committed in 399c8dd44ff44f4b496223c7cc980651c4= d6f6a0), > > > > > pr65947-7.c "failed" for aarch64-linux-gnu: > > > > > FAIL: gcc.dg/vect/pr65947-7.c scan-tree-dump-not vect "LOOP VECTO= RIZED" > > > > > FAIL: gcc.dg/vect/pr65947-7.c -flto -ffat-lto-objects > > > > > scan-tree-dump-not vect "LOOP VECTORIZED" > > > > > > > > > > /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { tar= get { > > > > > ! vect_fold_extract_last } } } } */ > > > > > > > > > > With your commit, condition_reduction in pr65947-7.c gets vectori= zed > > > > > regardless of vect_fold_extract_last, > > > > > which gates the above test (which is an improvement, because the > > > > > function didn't get vectorized before the commit). > > > > > > > > > > The attached patch thus removes the gating on vect_fold_extract_l= ast, > > > > > and the test passes again. > > > > > OK to commit ? > > > > > > > > OK. > > > > > > Or wait - the loop doesn't vectorize on x86_64, so I guess one > > > critical target condition is missing. Can you figure out which? > > > > I see > > > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > > note: vect_is_simple_use: operand last_19 =3D PHI , > > type of def: reduction > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > > note: vect_is_simple_use: vectype vector(4) int > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > > missed: multiple types in double reduction or condition reduction or > > fold-left reduction. > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:13:1: > > missed: not vectorized: relevant phi not supported: last_19 =3D PHI > > > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > > missed: bad operation or unsupported loop bound. > Hi Richard, > Looking at the aarch64 vect dump, it seems the loop in > condition_reduction gets vectorized with V4HI mode > while fails for other modes in vectorizable_condition: > > if ((double_reduc || reduction_type !=3D TREE_CODE_REDUCTION) > && ncopies > 1) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "multiple types in double reduction or condition= " > "reduction or fold-left reduction.\n"); > return false; > } > > From the dump: > foo.c:9:21: note: =3D=3D=3D vect_analyze_loop_operations =3D=3D=3D > foo.c:9:21: note: examining phi: last_19 =3D PHI > foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of > def: internal > foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int > foo.c:9:21: note: vect_is_simple_use: operand last_19 =3D PHI > , type of def: reduction > foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int > > For V8HI, VF =3D 8, and vectype_in =3D vector(4) int. > Thus ncopies =3D VF / length(vectype_in) =3D 2, which is greater than 1, > and thus fails: > foo.c:9:21: missed: multiple types in double reduction or condition > reduction or fold-left reduction. > foo.c:4:1: missed: not vectorized: relevant phi not supported: > last_19 =3D PHI > While for V4HI, VF =3D 4 and thus ncopies =3D 1, so it succeeds. > > For x86_64, it seems the vectorizer doesn't seem to try V4HI mode. > If I "force" the vectorizer to use V4HI mode, we get the following dump: > foo.c:9:21: note: =3D=3D=3D vect_analyze_loop_operations =3D=3D=3D > foo.c:9:21: note: examining phi: last_19 =3D PHI > foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of > def: internal > foo.c:9:21: note: vect_is_simple_use: vectype vector(2) int > foo.c:9:21: note: vect_is_simple_use: operand last_19 =3D PHI > , type of def: reduction > foo.c:9:21: note: vect_is_simple_use: vectype vector(2) int > foo.c:9:21: missed: multiple types in double reduction or condition > reduction or fold-left reduction. > > Not sure tho if this is the only reason for the test to fail to > vectorize on the target. > Will investigate in more details next week. The odd thing is that you say for (int i =3D 0; i < N; i++) { aval =3D a[i]; if (b[i] < min_v) last =3D aval; } fails to vectorize but for (int i =3D 0; i < N; i++) { if (b[i] < min_v) last =3D a[i]; } succeeds? The IL difference should be irrelevant for the reduction vectorization: [local count: 1049367889]: # last_19 =3D PHI # i_21 =3D PHI # ivtmp_18 =3D PHI _1 =3D (long unsigned int) i_21; _2 =3D _1 * 2; _3 =3D a_12(D) + _2; aval_13 =3D *_3; _5 =3D _1 * 4; _6 =3D b_14(D) + _5; _7 =3D *_6; last_16 =3D (int) aval_13; _9 =3D _7 < min_v_15(D); last_8 =3D _9 ? last_16 : last_19; i_17 =3D i_21 + 1; ivtmp_10 =3D ivtmp_18 - 1; if (ivtmp_10 !=3D 0) goto ; [97.68%] vs [local count: 1049367889]: # last_19 =3D PHI # i_21 =3D PHI # ivtmp_11 =3D PHI _1 =3D (long unsigned int) i_21; _2 =3D _1 * 4; _3 =3D b_13(D) + _2; _4 =3D *_3; _5 =3D _4 < min_v_14(D); _6 =3D _1 * 2; _38 =3D _37 + _6; _7 =3D (short int *) _38; _8 =3D .MASK_LOAD (_7, 16B, _5); last_16 =3D (int) _8; last_9 =3D _5 ? last_16 : last_19; i_17 =3D i_21 + 1; ivtmp_10 =3D ivtmp_11 - 1; if (ivtmp_10 !=3D 0) goto ; [97.68%] maybe since the "mask" is used twice with the .MASK_LOAD we are not actually looking at the def (the comparison) and it's the comparison which would introduce the "multiple types"? That is, I wonder why not sinking the load, avoiding a conditional load, makes a difference to vectorizing the condition/extract last reductio= n. It doesn't seem to make a difference for x86. That said, the "fix" is probably sticking the correct target on the dump-check, it seems that vect_fold_extract_last is no longer correct here. Richard. > Thanks, > Prathamesh > > > > Richard.