From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x33.google.com (mail-oa1-x33.google.com [IPv6:2001:4860:4864:20::33]) by sourceware.org (Postfix) with ESMTPS id 2CD21385828F for ; Fri, 16 Feb 2024 11:22:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2CD21385828F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2CD21385828F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:4860:4864:20::33 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708082563; cv=none; b=Pkn0r89OmP0r8hnsE0dtHKnMHd12IoiVcqa/j+w/RiZxQagejr7fW923VURWq7MHWT3fg/fZJ9HKekQ6+V1VV997ryyriGKQcEHGCJRBGkqdYeENIvSMLrkni16RoFRLXegDKjrJ9ZZF/gC/Sry+jpaRYfImo5S1BYTtoHX21EY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708082563; c=relaxed/simple; bh=AKlKA84vA3YhKtVh7euXePYOsoif8XgLel0XkchioV8=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=GJwX4qgY6JjVJmZGXNvpWCK59TEygEKymyqxZistszoP6Le9a+q1lAuHSijjY17Spf0XEk7BiFeaYRaexuQpSmed7nSb5IoZq4qaS16fRs7yKNHDy4QKx+qRj6b1slyj5BWwoKwqIMA8+0ZTWh7NYIWfL6VmusWW7r9MbblpHSI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-oa1-x33.google.com with SMTP id 586e51a60fabf-21e5fa2f7efso296436fac.0 for ; Fri, 16 Feb 2024 03:22:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1708082553; x=1708687353; darn=gcc.gnu.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=hV0KmfojBT+X/YMtU4NHqG2V2OfGERFM3dbZYKz44Fk=; b=MncbUCKIRkzYq56LDVz3iDnHnPcBIMHWUHH15MY7Kspx6fJGG1/90v6hZ/5wFmB1nI l7ZqK/l+T3kaNh8/PPKa5pBwbWT8CCvMyOijQ+TRgplomPKAuML2qnLIl1U6WoJywkEd wVpffuKWclwV/LXxwoR6Oae0DxGyOxcVFBooYAqFCEpKZ0Lka4BgjEX1Is7uRhf/M574 SLWBAlAp09VNXl08cB3CufCA+tCknIpFZi0nHW/AAvpYAFV9fO0r/dlgSM7eVSs8lBAD kCPr7aKII3eMt5zwwXWX9MDFitEbmgCAsM7MxSsGUaUOrvSug9yEExQqb9RB48Nja6gn uSjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708082553; x=1708687353; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hV0KmfojBT+X/YMtU4NHqG2V2OfGERFM3dbZYKz44Fk=; b=K8yNp5DMeAnoeXw8OGozrNGqJzvD/SFUZwlI9+Li3eraGtokBLWs7vSNNDzwNa87YR WYymQZdNbVL9Rq6lKGQ3YNALDFcXF7EzX6dhwHMZ/ykBPaS12pJgsiX4grn0PiKU6J/y 78YKRPYSR/40LM8JXWoBusCavqIFFhYPEj8lhYR5ybgiWJCuFwYoAzGnySFCi0YEYTEl pxfaZC6OZP2SCHTztYNUn/CvkTX163TBm4n+Ug7Z8Wp/d9aDFl+0UleU0H/o3thEtsta hbx4gP7vYWS7dfYitGHZp5jdSitT7412EnnlWc3nXl71TPUnsqMLikjztnlhbM8AbklB EMKw== X-Gm-Message-State: AOJu0YxjKqM+zADOPgxdMVgbAl3e2nszr6AZbs6L+QSus4Hmy3jZELWs ZbSnJXjd6+KwRLc/xjdp5hx0EU5qsPoGktZlf/G5ZCCTQdJp1SnO/qFQRaTs6+c= X-Google-Smtp-Source: AGHT+IE7nPyue9MkH9PqJ4BlcAHcOlSZQN0g3kkWBxUvpTj3F8GokzMCPJde+SwjKKIlZteczS8maA== X-Received: by 2002:a05:6870:9e85:b0:21e:5b46:1b31 with SMTP id pu5-20020a0568709e8500b0021e5b461b31mr2447180oab.22.1708082553329; Fri, 16 Feb 2024 03:22:33 -0800 (PST) Received: from [192.168.0.109] (hawk-18-b2-v4wan-167765-cust1304.vm26.cable.virginm.net. [82.41.69.25]) by smtp.gmail.com with ESMTPSA id lu18-20020a056871431200b0021a13e82975sm824848oab.33.2024.02.16.03.22.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 16 Feb 2024 03:22:32 -0800 (PST) Message-ID: <4eb1a40e-0f54-4e27-90f8-00f4bba90907@baylibre.com> Date: Fri, 16 Feb 2024 11:22:30 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: GCN RDNA2+ vs. GCC SLP vectorizer Content-Language: en-GB To: Richard Biener , Thomas Schwinge Cc: gcc-patches@gcc.gnu.org References: <87ttm8ka6h.fsf@euler.schwinge.ddns.net> <55q4729r-1014-5541-7p75-6rq6r97845r7@fhfr.qr> From: Andrew Stubbs In-Reply-To: <55q4729r-1014-5541-7p75-6rq6r97845r7@fhfr.qr> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_BARRACUDACENTRAL,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 16/02/2024 10:17, Richard Biener wrote: > On Fri, 16 Feb 2024, Thomas Schwinge wrote: > >> Hi! >> >> On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: >>> I've committed this patch >> >> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691 >> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later RDNA3/gfx1100 >> support builds on top of, and that's what I'm currently working on >> getting proper GCC/GCN target (not offloading) results for. >> >> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably simple, >> and hopefully representative for other SLP execution test FAILs >> (regressions compared to my earlier non-gfx1100 testing). >> >> $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c --sysroot=install/amdgcn-amdhsa -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem source-gcc/newlib/libc/include -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/ -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper setarch,--addr-no-randomize -fdump-tree-all-all -fdump-ipa-all-all -fdump-rtl-all-all -save-temps -march=gfx1100 >> >> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from >> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), so I >> suppose will also exhibit the same failure mode, once again? >> >> Compared to '-march=gfx90a', the differences begin in >> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'. >> >> Changed like: >> >> @@ -38,10 +38,10 @@ int main () >> #pragma GCC novector >> for (i = 1; i < N; i++) >> if (a[i] != i%4 + 1) >> - abort (); >> + __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1); >> >> if (a[0] != 5) >> - abort (); >> + __builtin_printf("%d %d != %d\n", 0, a[0], 5); >> >> ..., we see: >> >> $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out >> 40 5 != 1 >> 41 6 != 2 >> 42 7 != 3 >> 43 8 != 4 >> 44 5 != 1 >> 45 6 != 2 >> 46 7 != 3 >> 47 8 != 4 >> >> '40..47' are the 'i = 10..11' in 'foo', and the expectation is >> 'a[i * stride + 0..3] != 0'. So, either some earlier iteration has >> scribbled zero values over these (vector lane masking issue, perhaps?), >> or some other code generation issue? > > So we're indeed BB vectorizing this to > > _54 = MEM [(int *)_14]; > vect_iftmp.12_56 = .VCOND (_54, { 0, 0, 0, 0 }, { 1, 2, 3, 4 }, { 5, 6, > 7, 8 }, 115); > MEM [(int *)_14] = vect_iftmp.12_56; > > I don't understand the assembly very well but it might be that > the mask computation for the .VCOND scribbles the mask used > to constrain operation to 4 lanes? > > .L3: > s_mov_b64 exec, 15 > v_add_co_u32 v4, s[22:23], s32, v3 > v_mov_b32 v5, s33 > v_add_co_ci_u32 v5, s[22:23], 0, v5, s[22:23] > flat_load_dword v7, v[4:5] offset:0 > s_waitcnt 0 > flat_load_dword v0, v[10:11] offset:0 > s_waitcnt 0 > flat_load_dword v6, v[8:9] offset:0 > s_waitcnt 0 > v_cmp_ne_u32 s[18:19], v7, 0 > v_cndmask_b32 v0, v6, v0, s[18:19] > flat_store_dword v[4:5], v0 offset:0 > s_add_i32 s12, s12, 1 > s_add_u32 s32, s32, s28 > s_addc_u32 s33, s33, s29 > s_cmp_lg_u32 s12, s13 > s_cbranch_scc1 .L3 This basic block has EXEC set to 15 (4 lanes) throughout. The mask for the VCOND a.k.a. v_vndmask_b32 is in s[18:19]. Those things seem OK. I see the testcase avoids vec_extract V64SI to V4SI for gfx1100, even though it would be a no-op conversion, because the general case requires a permute instruction and named pattern insns can't have non-constant conditions. Is vec_extract allowed to FAIL? That might give a better result in this case. However, I must be doing something different because vect/bb-slp-cond-1.c passes for me, on gfx1100. Andrew