Hi! On 2022-10-21T00:44:30+0200, Aldy Hernandez wrote: > On Thu, Oct 20, 2022 at 9:22 PM Thomas Schwinge wrote: >> "Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]" attached? > > I see 7 different tests in this patch. Did the 6 that pass, fail > before my patch for PR107195 and are now working? Cause unless > that's the case, they shouldn't be in a test named pr107195-3.c, but > somewhere else. That's correct; I should've mentioned that I had verified this. With the code changes of commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 "[PR107195] Set range to zero when nonzero mask is 0" reverted, we get: PASS: gcc.dg/tree-ssa/pr107195-3.c (test for excess errors) FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call I see there's one XFAILed test in your patch ... XFAILed test case removed, see the attached "Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]"; OK now to push that version? > and this certainly > doesn't look like something that has anything to do with the patch I > submitted. Perhaps you could open a PR with an enhancement request > for this one? > > That being said... > > /* { dg-additional-options -O1 } */ > extern int > __attribute__((const)) > foo4b (int); > > int f4b (unsigned int r) > { > if (foo4b (r)) > r *= 8U; > > if ((r / 2U) & 2U) > r += foo4b (r); > > return r; > } > /* { dg-final { scan-tree-dump-times {gimple_call xfail *-*-* } } } */ > > At -O2, this is something PRE is doing, so GCC already handles this. > However, you are suggesting this isn't handled at -O1 and should be?? My thinking was that this optimization does work for 'r >> 1', but it doesn't work for 'r / 2'. > None of the VRPs run at -O1 so ranger-vrp won't even get a chance. > However, DOM runs at -O1 and it uses ranger to do simple copy > propagation and some jump threading...so technically we could do > something... > > DOM should be able to thread from the r *= 8U to the return because > the nonzero mask (known zeros) after the multiplication is 0xfffffff8, > which it could use to solve the second conditional as false. This > would leave us with: > > if (foo4b (r)) > { > r *= 8U; > return r; > } > else > { > if ((r / 2U) & 2U) > r += foo4b (r); > } > > ...which exposes the fact that the second call to foo4b() has the same > "r" as the first one, so it could be folded. I don't know whose job > it is to notice that two const calls have the same arguments, but ISTM > that if we thread the above correctly, someone should be able to clean > this up. No clue whether this happens at -O1. > > However... we're not threading this. It looks like we're not keeping > track of nonzero bits (known zeros) through the division. The > multiplication gives us 0xfffffff8 and we should be able to divide > that by 2 and get 0x7ffffffc which solves the second conditional to 0. > > So...maybe DOM+ranger could set things up for another pass to clean this up? > > Either way, you could open an enhancement request, if anything to keep > the nonzero mask up to date through the division. I've thus filed "Optimization opportunity where integer '/' corresponds to '>>'" for continuing that investigation. Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955