From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) by sourceware.org (Postfix) with ESMTPS id 4D2F43858D20 for ; Thu, 26 Oct 2023 09:19:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4D2F43858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4D2F43858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::12c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698311953; cv=none; b=dBg0Pc/TMGcwsFe11xcm8JirimrTq/zHJKlQGlJuSdHmlP2EJYKM9zQx2KRXxA/RoK4OsOr+SPRVbP5/ihrgxSdbLaZHIW95hOxkjp24Qy4VQJzfHKHYO/Wbs/6QG7K/GEvji6yQkPhvj8BUFqLnjOXbKr0S8XehcrIQ1GarYts= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698311953; c=relaxed/simple; bh=zfk8GVxsLOEPoA1kvGb85wCmSnlGlqNmO406qm999v4=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=i1TcD3ApXrGz6RZ8GuVnVi4dq97GOWqCN50F6yIOB8Y45oyVDnVrI6j9wIKXy5HE0syiSkQM63nRf5mrv+htEGAUSgJ756TuIq19akjzVCWZ7AIH5YhKW6/POH0V+6+8NLxMv54WPe5SAWaz9ypN7qbwp3bcm2mNorSH/cZ4FYI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x12c.google.com with SMTP id 2adb3069b0e04-507bd64814fso853421e87.1 for ; Thu, 26 Oct 2023 02:19:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698311950; x=1698916750; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZdW2OtOObxAbzKK4GvjAcqnhK2JNIwfSi/WPFP+ldG0=; b=Vx9XnuWrxDOixAjGmud8vBCR+LXYVh6GR0LRoYAbNB19ulZYbYOZMmsQ9NYR+I2r+S 5DKBkagG6CfMyD9S5HS3NoGF8g6ZM4ngzbYu+xH9pjRQb82HxdJeXyTYzU3umN+A0tna 6XlqSI+Fba2ExUl3r54DiP0M4+GEEgnpwW+cba1oA76Mtmyazu8VOYC1Cs8VB6WvB9Jl WdpVaAGLynKfv7amVIxRarnUZ0B9DS06I0wtC7fg48gAftIu+bMfQdoXRfDbU68hInid LgJfcHZlYiOu9lLUga7JiWpR25ec8B43ph24M1ptME26t57ht2ct7I1JYOKoZ6tTkIMc sWOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698311950; x=1698916750; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZdW2OtOObxAbzKK4GvjAcqnhK2JNIwfSi/WPFP+ldG0=; b=Vcy1Tp+rikxiLdwbS8Dt2AQa/isQDEDVrSEziAeNzFoYuMR59KIorjbvLzyIIddOju jjR01bCPaNrdAWVMhFyIFgNPTMtYkDSVJoY6wbhhkGqIEq6m5FcYv30xAACtOgYuoFd6 tHbrAPgFHRHqTSqm/xihlCvVOLcMiOuyU6I3uNK//8Z2SnwRx4GAHILHnxbw+ZeQFxeY 4s8a7rWR2skSqF24g8PWgxg/mX0pcKrsnsAKAGmCMyQ/Xz1zVSSj8rPpCLTpLXy8ONyw 9cBXip2J57EYkF8wVJVTKs5XkTitF2B3uJiL4LZveKDm92oNID1fa3En0h8bm+H1LqiN j0yg== X-Gm-Message-State: AOJu0YxVv39fX3fsCXxpu8F21vvni/MuAsgBo/dpj+yBG5cVmhJLbbp+ f5QwyDK8hQEJG4jnA7bRLzO1mKVSqYai5A8Isxg= X-Google-Smtp-Source: AGHT+IHbRZlxRyyss7L0IG6o92IFRMjwLLeHJIwVLOq9B3fUsiCd8ee7k8aO9RQUp7KWtwYPwV1a5jmBAqxMzGLDxms= X-Received: by 2002:ac2:44a5:0:b0:503:7fc:8afe with SMTP id c5-20020ac244a5000000b0050307fc8afemr12459080lfm.1.1698311949300; Thu, 26 Oct 2023 02:19:09 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Thu, 26 Oct 2023 11:18:57 +0200 Message-ID: Subject: Re: the elimination of if blocks in GCC during if-conversion and vectorization To: Hanke Zhang Cc: gcc@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Oct 23, 2023 at 2:56=E2=80=AFPM Hanke Zhang = wrote: > > Richard Biener =E4=BA=8E2023=E5=B9=B410=E6= =9C=8823=E6=97=A5=E5=91=A8=E4=B8=80 20:32=E5=86=99=E9=81=93=EF=BC=9A > > > > On Mon, Oct 23, 2023 at 12:50=E2=80=AFPM Hanke Zhang wrote: > > > > > > Hi Richard: > > > > > > Thanks for your advice. But when I try a simpler example like the one > > > below before looking at the code, GCC still does nothing. > > > > > > int main() { > > > int width; > > > scanf("%d", &width); > > > int sum =3D 0; > > > for (int i =3D 0; i < width; i++) sum +=3D i; > > > printf("%d\n", sum); > > > } > > > > > > I tried O3 and LTO, but still the same. So I'd like to ask why, or am > > > I doing something wrong? > > > > -fdump-tree-sccp-details-scev reveals > > > > (set_scalar_evolution > > instantiated_below =3D 5 > > (scalar =3D sum_9) > > (scalar_evolution =3D {0, +, {1, +, 1}_1}_1)) > > ) > > (chrec_apply > > (varying_loop =3D 1) > > (chrec =3D {0, +, {1, +, 1}_1}_1) > > (x =3D (unsigned int) width.0_12 + 4294967295) > > (res =3D scev_not_known)) > > > > so we don't know how to apply a variable number of iterations to > > the affine expression {0, +, {1, +, 1}_1}_1, that is, we do not > > know how to compute the final value of the reduction. > > > > For a constant, say width =3D=3D 100 we do: > > > > (set_scalar_evolution > > instantiated_below =3D 2 > > (scalar =3D sum_6) > > (scalar_evolution =3D {0, +, {1, +, 1}_1}_1)) > > ) > > (chrec_apply > > (varying_loop =3D 1) > > (chrec =3D {0, +, {1, +, 1}_1}_1) > > (x =3D 99) > > (res =3D 4950)) > > Yeah, I also found this result in previous experiments. But what > confuses me is that if the 'width' can't be inferred to INTEGER_CST, > there's nothing we can do, right? > > Because in my case, the variables corresponding to 'width' are almost > all undetermined values, such as 'width =3D rand()'. That said, I can > hardly get any optimizations in my cases, right? I think the result would be (width * (width - 1)) / 2, chrec_apply is where "pattern matching" of known series is done. Richard. > > Thanks > Hanke Zhang > > > > > > Richard. > > > > > > > > Thanks > > > Hanke Zhang > > > > > > Richard Biener =E4=BA=8E2023=E5=B9=B410= =E6=9C=8819=E6=97=A5=E5=91=A8=E5=9B=9B 20:00=E5=86=99=E9=81=93=EF=BC=9A > > > > > > > > On Tue, Oct 17, 2023 at 2:39=E2=80=AFPM Hanke Zhang wrote: > > > > > > > > > > Hi Richard > > > > > I get it, thank you again. > > > > > > > > > > And I got another problem, so I'd like ask it by the way. Can the= left > > > > > shift of the induction variable in a loop be optimized as a const= ant? > > > > > Like the code below: > > > > > > > > > > int ans =3D 0; > > > > > int width =3D rand() % 16; > > > > > for (int j =3D 0; j < width; j++) > > > > > ans +=3D 1 << (j + width) > > > > > > > > > > into: > > > > > > > > > > int width =3D rand() % 16; > > > > > ans =3D (1 << (2 * width) - (1 << width)); > > > > > > > > > > I came across a more complex version of that and found that gcc > > > > > doesn't seem to handle it, so wanted to write a pass myself to > > > > > optimize it. > > > > > > > > > > I got two questions here. Does GCC have such optimizations? If I = want > > > > > to do my own optimization, where should I put it? Put it behind t= he > > > > > pass_iv_optimize? > > > > > > > > GCC has the final value replacement pass (pass_scev_cprop) doing th= ese > > > > kind of transforms. Since 'ans' does not have an affine evolution = this > > > > case would need to be pattern matched (there are some existing patt= ern > > > > matchings in the pass). > > > > > > > > > Thanks > > > > > Hanke Zhang > > > > > > > > > > Richard Biener =E4=BA=8E2023=E5=B9= =B410=E6=9C=8817=E6=97=A5=E5=91=A8=E4=BA=8C 20:00=E5=86=99=E9=81=93=EF=BC= =9A > > > > > > > > > > > > On Tue, Oct 17, 2023 at 1:54=E2=80=AFPM Hanke Zhang wrote: > > > > > > > > > > > > > > Richard Biener =E4=BA=8E2023=E5= =B9=B410=E6=9C=8817=E6=97=A5=E5=91=A8=E4=BA=8C 17:26=E5=86=99=E9=81=93=EF= =BC=9A > > > > > > > > > > > > > > > > On Thu, Oct 12, 2023 at 2:18=E2=80=AFPM Hanke Zhang via Gcc= wrote: > > > > > > > > > > > > > > > > > > Hi, I'm recently working on vectorization of GCC. I'm stu= ck in a small > > > > > > > > > problem and would like to ask for advice. > > > > > > > > > > > > > > > > > > For example, for the following code: > > > > > > > > > > > > > > > > > > int main() { > > > > > > > > > int size =3D 1000; > > > > > > > > > int *foo =3D malloc(sizeof(int) * size); > > > > > > > > > int c1 =3D rand(), t1 =3D rand(); > > > > > > > > > > > > > > > > > > for (int i =3D 0; i < size; i++) { > > > > > > > > > if (foo[i] & c1) { > > > > > > > > > foo[i] =3D t1; > > > > > > > > > } > > > > > > > > > } > > > > > > > > > > > > > > > > > > // prevents the loop above from being optimized > > > > > > > > > for (int i =3D 0; i < size; i++) { > > > > > > > > > printf("%d", foo[i]); > > > > > > > > > } > > > > > > > > > } > > > > > > > > > > > > > > > > > > First of all, the if statement block in the loop will be = converted to > > > > > > > > > a MASK_STORE through if-conversion optimization. But afte= r > > > > > > > > > tree-vector, it will still become a branched form. The pa= rt of the > > > > > > > > > final disassembly structure probably looks like below(Usi= ng IDA to do > > > > > > > > > this), and you can see that there is still such a branch = 'if ( !_ZF )' > > > > > > > > > in it, which will lead to low efficiency. > > > > > > > > > > > > > > > > > > do > > > > > > > > > { > > > > > > > > > while ( 1 ) > > > > > > > > > { > > > > > > > > > __asm > > > > > > > > > { > > > > > > > > > vpand ymm0, ymm2, ymmword ptr [rax] > > > > > > > > > vpcmpeqd ymm0, ymm0, ymm1 > > > > > > > > > vpcmpeqd ymm0, ymm0, ymm1 > > > > > > > > > vptest ymm0, ymm0 > > > > > > > > > } > > > > > > > > > if ( !_ZF ) > > > > > > > > > break; > > > > > > > > > _RAX +=3D 8; > > > > > > > > > if ( _RAX =3D=3D v9 ) > > > > > > > > > goto LABEL_5; > > > > > > > > > } > > > > > > > > > __asm { vpmaskmovd ymmword ptr [rax], ymm0, ymm3 } > > > > > > > > > _RAX +=3D 8; > > > > > > > > > } > > > > > > > > > while ( _RAX !=3D v9 ); > > > > > > > > > > > > > > > > > > Why can't we just replace the vptest and if statement wit= h some other > > > > > > > > > instructions like vpblendvb so that it can be faster? Or = is there a > > > > > > > > > good way to do that? > > > > > > > > > > > > > > > > The branch is added by optimize_mask_stores after vectoriza= tion because > > > > > > > > fully masked (disabled) masked stores can incur a quite hea= vy penalty on > > > > > > > > some architectures when fault assists (read-only pages, but= also COW pages) > > > > > > > > are ran into. All the microcode handling needs to possibly= be carried out > > > > > > > > multiple times, for each such access to the same page. Tha= t can cause > > > > > > > > a 1000x slowdown when you hit this case. Thus every masked= store > > > > > > > > is replaced by > > > > > > > > > > > > > > > > if (mask !=3D 0) > > > > > > > > masked_store (); > > > > > > > > > > > > > > > > and this is an optimization (which itself has a small cost)= . > > > > > > > > > > > > > > > > Richard. > > > > > > > > > > > > > > Yeah, I know that and I have seen the code of optimize_mask_s= tore(). > > > > > > > And the main problem here is that when multiple MASK_STORE ap= pear in > > > > > > > the same loop, many branches will appear, resulting in a decr= ease in > > > > > > > overall efficiency. > > > > > > > > > > > > > > And my original idea is that why can't we replace MASK_STORE = with more > > > > > > > effective SIMD instructions because icc can do much better in= this > > > > > > > case. > > > > > > > > > > > > ICC probably doesn't care for the case where foo[] isn't writab= le. In > > > > > > fact for the case at hand we see it comes from malloc() which w= e > > > > > > can assume to return writable memory I guess. That means if-co= nversion > > > > > > can treat the unconditional read as a way to also allow to spec= ulate > > > > > > the write (with -fallow-store-data-races). > > > > > > > > > > > > Note there's currently no pointer analysis that tracks writabil= ity. > > > > > > > > > > > > > Then I give it up, because the ability to analyze vectorizati= on > > > > > > > of gcc is not as good as icc and my ability does not support = me > > > > > > > modifying this part of the code. > > > > > > > > > > > > > > Thanks very much for your reply. > > > > > > > > > > > > You're welcome. > > > > > > > > > > > > Richard. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > Hanke Zhang