From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-xa2d.google.com (mail-vk1-xa2d.google.com [IPv6:2607:f8b0:4864:20::a2d]) by sourceware.org (Postfix) with ESMTPS id 4806B3858D28 for ; Sat, 5 Feb 2022 03:35:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4806B3858D28 Received: by mail-vk1-xa2d.google.com with SMTP id m131so4801900vkm.7 for ; Fri, 04 Feb 2022 19:35:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=sqbwNIAiLvnrhmhwU0aWm8uc5ggE9ABr1odm/US0X7A=; b=YweqUrG+ERufo8a/+v1mgB3GQHj+V++AQvlFkrMei3adSTBRsH/B9GlxiyuG6kSMhs hE0l6KbtUm9l71eySxOtT5Vr+BCvA2axbX0ujBkfOQGZdoLTqhbNaPupN891NE/MiyIA 9FpW6K1Hp6vP7mWY4vvcu2cgs6M3bNy43iaPdkiRgkzXBcDG+8WBxf/NwAaa7/OcgiW6 SKS4IgEBHJmGmM15sowFywYcn2K0kF3r7pNBrECtYq6TBlMIXvUgZQnYAaKplcSiTUse QKIygcegVHgpeq8MlPNl2Wed8qKhH6tM8b6071jICVPcsjV+4q/mj4wa6b6EehktNi/t Ca5Q== X-Gm-Message-State: AOAM533lZ4qHhIdDNRNRaWl1OZ1UDn5awZkmXvWJ2XrFUkR5Q5nJkW+a 3HRS/l4EGQ6qOAuHKVWEhcMges6eC+nIZ8T9IspYC3N9oJg= X-Google-Smtp-Source: ABdhPJzktVn0+woC4hjmXd0VmT58C9M5kS7i7ybwXpg0XteyNfQHRGa4C6t5NYLWN4dUQRZzjYbJHMf2Cz2nGBeMeCM= X-Received: by 2002:a1f:e745:: with SMTP id e66mr2313335vkh.24.1644032111759; Fri, 04 Feb 2022 19:35:11 -0800 (PST) MIME-Version: 1.0 References: <87a6h4jnry.fsf@adacore.com> <12914311.uLZWGnKmhe@fomalhaut> <87k0f2z5no.fsf@adacore.com> In-Reply-To: From: Andrew Pinski Date: Fri, 4 Feb 2022 19:34:59 -0800 Message-ID: Subject: Re: [PATCH] testsuite: Robustify aarch64/simd tests against more aggressive DCE To: Richard Sandiford , Marc Poulhies , Eric Botcazou , GCC Patches Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 05 Feb 2022 03:35:14 -0000 On Fri, Feb 4, 2022 at 3:21 AM Richard Sandiford via Gcc-patches wrote: > > Sorry, just realised I'd never replied to this. > > Marc Poulhies writes: > > Eric Botcazou writes: > >>> The new variables seem to be unused, so I think slightly stronger > >>> DCE could remove the calls even after the patch. Perhaps the contain= ing > >>> functions should take an int32x4_t *ptr or something, with the calls > >>> assigning to different ptr[] indices. > >> > >> We run a minimal DCE pass at -O0 in our compiler to eliminate all the = garbage > >> generated by the gimplifier for variable-sized types (people care abou= t code > >> size at -O0 in specific contexts) but it does not touch anything writt= en by > >> the user (and debugging is unaffected of course). Given that the buil= tins are > >> pure functions and the arguments have no side effects, it eliminates t= he > >> calls, but adding a LHS blocks that because this minimal DCE pass pres= erves > >> anything user-related, in particular assignments to user variables. > >> > >>> I think it would be better to do that using new calls though, > >>> and xfail the existing ones when they no longer work. For example: > >>> > >>> /* { dg-error "lane -1 out of range 0 - 7" "" {target *-*-*} 0 } */ > >>> vqdmlal_high_laneq_s16 (int32x4_a, int16x8_b, int16x8_c, -1); > >>> /* { dg-error "lane -1 out of range 0 - 7" "" {target *-*-*} 0 } */ > >>> ptr[0] =3D vqdmlal_high_laneq_s16 (int32x4_a, int16x8_b, int16x8_c,= -1); > >>> > >>> That way we don't lose the existing tests. > >> > >> Frankly I'm not quite sure of what we can lose by adding a LHS here, c= an you > >> elaborate a bit? We would need a solution that works out of the box w= ith our > >> compiler in the future, i.e. without having to tweak 50 testcases agai= n. > > > > Hi Richard, > > > > Thank for your reply ! > > > > As =C3=89ric, I'm also wondering why having LHS in the existing tests w= ould > > make us loose them. I guess I'm not familiar enough with this part of > > the testsuite and I'm missing something. > > The problem is that we only enforce lane bounds via calls to > __builtin_aarch64_im_lane_boundsi. In previous releases, the check > only happend at RTL expansion time, so the check would be skipped if > any gimple pass removed the call. Now we do the checking during > folding, but that still misses cases. E.g., compare the -O0 and -O1 > behaviour for: Actually I looked into the below testcase and __builtin_aarch64_im_lane_boundsi is not part of the intrinsic. Basically some intrinsics have their own bounds checking as part of the builtin rather than using __builtin_aarch64_im_lane_boundsi. That is the problem shows up in GCC 11 where the folding of __builtin_aarch64_im_lane_boundsi on the gimple level didn't happen. I will file a bug report on this regression later tonight or tomorrow. Here are the uses of aarch64_simd_lane_bounds which emit the error (besides the __builtin_aarch64_im_lane_boundsi builtin itself): function: aarch64_expand_fcmla_builtin builtin_simd_arg args: SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX SIMD_ARG_LANE_INDEX SIMD_ARG_LANE_PAIR_INDEX SIMD_ARG_LANE_QUADTUP_INDEX rtl named patterns: aarch64_ld_lane aarch64_st_lane Thanks, Andrew Pinski > > #include > > void f(int32x4_t *p0, int16x8_t *p1) { > vqdmlal_high_laneq_s16(p0[0], p1[0], p1[1], -1); > //p0[0] =3D vqdmlal_high_laneq_s16(p0[0], p1[0], p1[1], -1); > } > > -O0 gives the error but -O1 doesn't [https://godbolt.org/z/1KosTY43T]. > The -O1 behaviour here is wrong: badly-formed calls should be rejected > with a diagnostic even if the calls are unused. Clang gets this right > in both cases [https://godbolt.org/z/EGxs8jq97]. > > I think keeping the lhs-free calls is important for making sure that > the -O0 behaviour doesn't regress without the DCE. > > Your DCE will regress it, but that's the fault of the arm_neon.h > implementation rather than the fault of your pass. Having the > tests but XFAILing them seems like the best way of dealing with that. > Hopefully we'll then see some progression if the arm_neon.h implementatio= n > is improved in future. > > Thanks, > Richard