From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x22e.google.com (mail-lj1-x22e.google.com [IPv6:2a00:1450:4864:20::22e]) by sourceware.org (Postfix) with ESMTPS id 25F30385350C for ; Tue, 17 Oct 2023 12:00:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 25F30385350C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 25F30385350C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::22e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697544051; cv=none; b=v36V49lfDRxt7yVr0dvP25IWAvM5QtdkgYZ47F10Jcf4JpdKEtovHRHRBhig4oluJTgNZ576VkbDepSQ+cuI+frMpQChcFU7zeTuZQp7IrMQP3Cg0I9LGVJZq1E5GTyvikEcd2+/bfZrKhaviVE450yjASlx4aKS8L0muEZeWQw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697544051; c=relaxed/simple; bh=rcnsMJMLBC8n00CCtqoLPlu9r8S98Y3bDEU5OAE9lfo=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=YaRaTlyEI24iHi58eOBZUTFXIDHbbXDFA9Avuj3PrklZrs03SHIw/Iw+YP1DCsfeZhWbJmnnx6P4kB3UjZ/uuC5bjqNJg5v51BKWIzPCF22WIkYyKr77baGz94NQeF+6Hvm7DAOFLnd1ylehjSoVbmR0qbGdmmNH2M4fQisUz0A= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lj1-x22e.google.com with SMTP id 38308e7fff4ca-2c503da4fd6so56496521fa.1 for ; Tue, 17 Oct 2023 05:00:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697544047; x=1698148847; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yke/KL/gPrlsDqykyB95pF53rhtUuJZb0Dsi30IIBrg=; b=cZmrgtuc7urQtbcvo3bV8RrC3m+4LKH39EIG64Qsz77V8JVHRmqsTF0qCvz77XiqWu KSz0e8i/c8LL5QKjUh0+uTrSPimMAXAQqRO54Od+NDJcmhbzVv5mlWZG+Xx7BC3Bfz5A h6WoiiFNm7H0irrgEOFAJ6quX9aiT5x0e4+xbD2LnzTC0kTkXSx5ltoc40imfwD7nFaV cJrGpP4ShVBd3ztjble3HwvKx+PCgecNbaLXPX0f/Nt7fLjXo9zfP8OjuMFdhQEYaL/5 d7P+Uzuwr0IcfbkerbNg9lgKpolFiERZQ2ndydZRU0TQRiLpwp/obmlzHG5SmCNGUh/O LD3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697544047; x=1698148847; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yke/KL/gPrlsDqykyB95pF53rhtUuJZb0Dsi30IIBrg=; b=Q2L0HDEmXmfrwVMtvDcrN9CJPkxduOohif77ACLJPzYcG/lJdeG1ryC1oxtuwBwYao 5WXcBWIaygajeVjNZB/m8OgH1jWZxJ1h6UL5scMru0kOCcWv/P9SdUSsxSJFUJmBEfUn 0eKrNpg7QLZvdbNPoFog3fcxkvhxdTy/UY+oJ2gPbncRDE1bKRrm69D9WDOEhWiRl1XP Felnup1kItNsCQhuMpTGmfUIltDf51GFpswhCET4B0arpGtPtzwxatpj9PxizEsJ2QKC Hpnfpy7Tvx68Gwl8kDdD3ANHBbKeqSUQLQInbwk/ecUVt350rwfgthD+nYpYK0IQYh+c LcKw== X-Gm-Message-State: AOJu0Yz0Jhl6LyWYKM27LNYq/qEApTrbavUEB8fqixCBOPtLQS9Gw8XR 4i9hTaSBiWGQQEg1S4hZAuVZgylG++ZpjamnBIYE2mcu X-Google-Smtp-Source: AGHT+IEj1RpnGK/9C7nMERI94Jnm8oqjkxoRPApKIUFsdtyx0gybWeN44keAMu7LkNR9vEC51FsZnpI5Ods4DPbkBZs= X-Received: by 2002:ac2:4989:0:b0:503:1be5:24eb with SMTP id f9-20020ac24989000000b005031be524ebmr1625111lfl.50.1697544047262; Tue, 17 Oct 2023 05:00:47 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Tue, 17 Oct 2023 13:57:49 +0200 Message-ID: Subject: Re: the elimination of if blocks in GCC during if-conversion and vectorization To: Hanke Zhang Cc: gcc@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Oct 17, 2023 at 1:54=E2=80=AFPM Hanke Zhang = wrote: > > Richard Biener =E4=BA=8E2023=E5=B9=B410=E6= =9C=8817=E6=97=A5=E5=91=A8=E4=BA=8C 17:26=E5=86=99=E9=81=93=EF=BC=9A > > > > On Thu, Oct 12, 2023 at 2:18=E2=80=AFPM Hanke Zhang via Gcc wrote: > > > > > > Hi, I'm recently working on vectorization of GCC. I'm stuck in a smal= l > > > problem and would like to ask for advice. > > > > > > For example, for the following code: > > > > > > int main() { > > > int size =3D 1000; > > > int *foo =3D malloc(sizeof(int) * size); > > > int c1 =3D rand(), t1 =3D rand(); > > > > > > for (int i =3D 0; i < size; i++) { > > > if (foo[i] & c1) { > > > foo[i] =3D t1; > > > } > > > } > > > > > > // prevents the loop above from being optimized > > > for (int i =3D 0; i < size; i++) { > > > printf("%d", foo[i]); > > > } > > > } > > > > > > First of all, the if statement block in the loop will be converted to > > > a MASK_STORE through if-conversion optimization. But after > > > tree-vector, it will still become a branched form. The part of the > > > final disassembly structure probably looks like below(Using IDA to do > > > this), and you can see that there is still such a branch 'if ( !_ZF )= ' > > > in it, which will lead to low efficiency. > > > > > > do > > > { > > > while ( 1 ) > > > { > > > __asm > > > { > > > vpand ymm0, ymm2, ymmword ptr [rax] > > > vpcmpeqd ymm0, ymm0, ymm1 > > > vpcmpeqd ymm0, ymm0, ymm1 > > > vptest ymm0, ymm0 > > > } > > > if ( !_ZF ) > > > break; > > > _RAX +=3D 8; > > > if ( _RAX =3D=3D v9 ) > > > goto LABEL_5; > > > } > > > __asm { vpmaskmovd ymmword ptr [rax], ymm0, ymm3 } > > > _RAX +=3D 8; > > > } > > > while ( _RAX !=3D v9 ); > > > > > > Why can't we just replace the vptest and if statement with some other > > > instructions like vpblendvb so that it can be faster? Or is there a > > > good way to do that? > > > > The branch is added by optimize_mask_stores after vectorization because > > fully masked (disabled) masked stores can incur a quite heavy penalty o= n > > some architectures when fault assists (read-only pages, but also COW pa= ges) > > are ran into. All the microcode handling needs to possibly be carried = out > > multiple times, for each such access to the same page. That can cause > > a 1000x slowdown when you hit this case. Thus every masked store > > is replaced by > > > > if (mask !=3D 0) > > masked_store (); > > > > and this is an optimization (which itself has a small cost). > > > > Richard. > > Yeah, I know that and I have seen the code of optimize_mask_store(). > And the main problem here is that when multiple MASK_STORE appear in > the same loop, many branches will appear, resulting in a decrease in > overall efficiency. > > And my original idea is that why can't we replace MASK_STORE with more > effective SIMD instructions because icc can do much better in this > case. ICC probably doesn't care for the case where foo[] isn't writable. In fact for the case at hand we see it comes from malloc() which we can assume to return writable memory I guess. That means if-conversion can treat the unconditional read as a way to also allow to speculate the write (with -fallow-store-data-races). Note there's currently no pointer analysis that tracks writability. > Then I give it up, because the ability to analyze vectorization > of gcc is not as good as icc and my ability does not support me > modifying this part of the code. > > Thanks very much for your reply. You're welcome. Richard. > > > > > > > > Thanks > > > Hanke Zhang