From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by sourceware.org (Postfix) with ESMTPS id DB6CF38582B5 for ; Fri, 16 Feb 2024 12:41:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DB6CF38582B5 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DB6CF38582B5 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::430 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708087271; cv=none; b=D6Ze7Mu/hhWRx45ROWGCH+4tnimqeDDRrcedWJQJ0I0ayhA+bxm62n+jszTxiKD3mcGXEsu99zPNXmQE0xHLMaXCgoz3v4O0x5bL5k3V6TR0MKO3UhCRTSeX6L7Bo9UIGitgfAIwloTRB/9pYaDktF/zsoYs9eokOtWkwy/tyHY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708087271; c=relaxed/simple; bh=JLUc7yOwTDXD303dRiFn6Jl5xxRgRPG6kusZ1nFY1b0=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=caYAntFJ8v642O/oAFYQ1HiRShARiw/CVTJKBbsUBZB12TBKLwsm4C46BitbnZrAkCsfK9ksBs5ndR2vq7KxiUcLVj3jhdiQkyFqToSDF7oiiDaVtMN/4uB6qMWX0W2kFDdY8ucr5h2w5QIf/TfEvUVzsJkjCBMCTUf9/9OVHkU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wr1-x430.google.com with SMTP id ffacd0b85a97d-33d10936af1so1056233f8f.1 for ; Fri, 16 Feb 2024 04:41:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1708087266; x=1708692066; darn=gcc.gnu.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=Lq6sSEVr4yAiWhNJnqLQY87YDeolnJzFNT5OgdvX6yY=; b=UfO265+3QLSgWsn3ZzRRLjnVz7ckgzZTSexjIwaOHI+zyZBBq4p0HvOYgFk9Zui4MR V/PY9ygRm9FvHUmdisjygnr6tcaWN8I3jkkmYc5Ejzfvu0niBq1/AceD8SkHbkqy6naD TXSKVX1P1xTaEhMoDCfwCv6umBCAOvwG6jr201cwPg0v3yQu0ohFhBERZPC2ZtpFPMsH 8aWX6ymz+S3BOf5II1NWsaghGdu/2c0clhcHGZ+C2PEsYc09odkOJtLo2wQMx3dyY/jx 1+xEw3gbeKa9c+ieqjNKAdZNNFlb5vgiOurUHrFc8qQTTHBPCpfWgwpRZPkIfIhkSqyp aflA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708087266; x=1708692066; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Lq6sSEVr4yAiWhNJnqLQY87YDeolnJzFNT5OgdvX6yY=; b=SVWQqL7QKEC29Mv9xmKwKtsYUBDnI94zkTEZ+ESgbCTs1hviH7WJ4yoNWPJxzJTWTV cgu4qUuBeCsTallWCijbuLLFz5D7JT5bgLZ1FYSwFHx00rXYjXoLVS8BUKaeeeR/wMiv kHOEwV3ezOKpWn5zn6LV7kqKzcNWSzEsvLktCVz5LKne5gVt0LmRbNEgjq/5qfEOumsg W+rLr4cX9nC9sVJiMdBf0MwYT0pwVwVrzKYdjzH8rzmasyYM1rW/zI0tJUs7HJsHTVza vRXX9zbjOR6uUfZZcNIIIG5ymiSdXqyeP0QRPwt3QnZpjFSKOtTYkrbJRDg4AVsj6vNt SA4A== X-Forwarded-Encrypted: i=1; AJvYcCV84T2uoRLT7r4u3n9hSMcVfXjIpKkVAoiZtqUYUWk1LPhEeeQWrh9Cy8R5LYYn+OltlBxsMw6xxmc4IcLoZUu9M5E3KLuDUg== X-Gm-Message-State: AOJu0YyxPP4tVrFDYcezBjYWF4GYVVXCRYITI5SHSwoL3o+uSSK8WtRr Hv5GkVloOS5c44XS8yRAqHh5aXSTixXuSpE8ZDvImt+nSfcT1J5IdPtZGaC9jLU= X-Google-Smtp-Source: AGHT+IEesW/aIyvdlvSECpt9Z5GQ4B2xEPzyD2CM23JDorYFb/347GC3sGrJy09u8Jto/OvCKjjJ9g== X-Received: by 2002:a5d:4443:0:b0:33b:4b47:e7a9 with SMTP id x3-20020a5d4443000000b0033b4b47e7a9mr3002560wrr.71.1708087266370; Fri, 16 Feb 2024 04:41:06 -0800 (PST) Received: from [192.168.0.109] (hawk-18-b2-v4wan-167765-cust1304.vm26.cable.virginm.net. [82.41.69.25]) by smtp.gmail.com with ESMTPSA id c5-20020a5d4f05000000b0033b684d6d5csm2137588wru.20.2024.02.16.04.41.05 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 16 Feb 2024 04:41:06 -0800 (PST) Message-ID: <9714f90d-a581-4ebe-a031-d5d8c6db9cf6@baylibre.com> Date: Fri, 16 Feb 2024 12:41:06 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: GCN RDNA2+ vs. GCC SLP vectorizer Content-Language: en-GB To: Richard Biener Cc: Thomas Schwinge , gcc-patches@gcc.gnu.org References: <87ttm8ka6h.fsf@euler.schwinge.ddns.net> <55q4729r-1014-5541-7p75-6rq6r97845r7@fhfr.qr> <4eb1a40e-0f54-4e27-90f8-00f4bba90907@baylibre.com> <53s543rq-36qn-ns26-o0qo-97o168o707pn@fhfr.qr> From: Andrew Stubbs In-Reply-To: <53s543rq-36qn-ns26-o0qo-97o168o707pn@fhfr.qr> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_BARRACUDACENTRAL,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 16/02/2024 12:26, Richard Biener wrote: > On Fri, 16 Feb 2024, Andrew Stubbs wrote: > >> On 16/02/2024 10:17, Richard Biener wrote: >>> On Fri, 16 Feb 2024, Thomas Schwinge wrote: >>> >>>> Hi! >>>> >>>> On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: >>>>> I've committed this patch >>>> >>>> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691 >>>> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later RDNA3/gfx1100 >>>> support builds on top of, and that's what I'm currently working on >>>> getting proper GCC/GCN target (not offloading) results for. >>>> >>>> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably simple, >>>> and hopefully representative for other SLP execution test FAILs >>>> (regressions compared to my earlier non-gfx1100 testing). >>>> >>>> $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ >>>> source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c >>>> --sysroot=install/amdgcn-amdhsa -ftree-vectorize >>>> -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common >>>> -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem >>>> build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem >>>> source-gcc/newlib/libc/include >>>> -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/ >>>> -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper >>>> setarch,--addr-no-randomize -fdump-tree-all-all -fdump-ipa-all-all >>>> -fdump-rtl-all-all -save-temps -march=gfx1100 >>>> >>>> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from >>>> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), so I >>>> suppose will also exhibit the same failure mode, once again? >>>> >>>> Compared to '-march=gfx90a', the differences begin in >>>> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'. >>>> >>>> Changed like: >>>> >>>> @@ -38,10 +38,10 @@ int main () >>>> #pragma GCC novector >>>> for (i = 1; i < N; i++) >>>> if (a[i] != i%4 + 1) >>>> - abort (); >>>> + __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1); >>>> >>>> if (a[0] != 5) >>>> - abort (); >>>> + __builtin_printf("%d %d != %d\n", 0, a[0], 5); >>>> >>>> ..., we see: >>>> >>>> $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out >>>> 40 5 != 1 >>>> 41 6 != 2 >>>> 42 7 != 3 >>>> 43 8 != 4 >>>> 44 5 != 1 >>>> 45 6 != 2 >>>> 46 7 != 3 >>>> 47 8 != 4 >>>> >>>> '40..47' are the 'i = 10..11' in 'foo', and the expectation is >>>> 'a[i * stride + 0..3] != 0'. So, either some earlier iteration has >>>> scribbled zero values over these (vector lane masking issue, perhaps?), >>>> or some other code generation issue? >>> >>> So we're indeed BB vectorizing this to >>> >>> _54 = MEM [(int *)_14]; >>> vect_iftmp.12_56 = .VCOND (_54, { 0, 0, 0, 0 }, { 1, 2, 3, 4 }, { 5, 6, >>> 7, 8 }, 115); >>> MEM [(int *)_14] = vect_iftmp.12_56; >>> >>> I don't understand the assembly very well but it might be that >>> the mask computation for the .VCOND scribbles the mask used >>> to constrain operation to 4 lanes? >>> >>> .L3: >>> s_mov_b64 exec, 15 >>> v_add_co_u32 v4, s[22:23], s32, v3 >>> v_mov_b32 v5, s33 >>> v_add_co_ci_u32 v5, s[22:23], 0, v5, s[22:23] >>> flat_load_dword v7, v[4:5] offset:0 >>> s_waitcnt 0 >>> flat_load_dword v0, v[10:11] offset:0 >>> s_waitcnt 0 >>> flat_load_dword v6, v[8:9] offset:0 >>> s_waitcnt 0 >>> v_cmp_ne_u32 s[18:19], v7, 0 >>> v_cndmask_b32 v0, v6, v0, s[18:19] >>> flat_store_dword v[4:5], v0 offset:0 >>> s_add_i32 s12, s12, 1 >>> s_add_u32 s32, s32, s28 >>> s_addc_u32 s33, s33, s29 >>> s_cmp_lg_u32 s12, s13 >>> s_cbranch_scc1 .L3 >> >> This basic block has EXEC set to 15 (4 lanes) throughout. The mask for the >> VCOND a.k.a. v_vndmask_b32 is in s[18:19]. Those things seem OK. >> >> I see the testcase avoids vec_extract V64SI to V4SI for gfx1100, even though >> it would be a no-op conversion, because the general case requires a permute >> instruction and named pattern insns can't have non-constant conditions. Is >> vec_extract allowed to FAIL? That might give a better result in this case. I found that vec_extract is not allowed to FAIL. I guess the only way to allow the no-op conversions is to implement manual fall-back code-gen for the broken cases. >> >> However, I must be doing something different because vect/bb-slp-cond-1.c >> passes for me, on gfx1100. > > I didn't try to run it - when doing make check-gcc fails to using > gcn-run for test invocation, what's the trick to make it do that? There's a config file for nvptx here: https://github.com/SourceryTools/nvptx-tools/blob/master/nvptx-none-run.exp You can probably make the obvious adjustments. I think Thomas has a GCN version with a few more features. I usually use the CodeSourcery magic stack of scripts for testing installed toolchains on remote devices, so I'm not too familiar with using Dejagnu directly. Andrew