* [PATCH] [tree-optimization] Fix for PR97223 @ 2020-10-24 0:19 Eugene Rozenfeld 2020-10-27 9:23 ` Richard Biener 0 siblings, 1 reply; 5+ messages in thread From: Eugene Rozenfeld @ 2020-10-24 0:19 UTC (permalink / raw) To: gcc-patches [-- Attachment #1: Type: text/plain, Size: 533 bytes --] This patch adds a pattern for folding x < (short) ((unsigned short)x + const) to x <= SHORT_MAX - const (and similarly for other integral types) if const is not 0. as described in PR97223. For example, without this patch the x86_64-pc-linux code generated for this function bool f(char x) { return x < (char)(x + 12); } is lea eax,[rdi+0xc] cmp al,dil setg al ret With the patch the code is cmp dil,0x73 setle al ret Tested on x86_64-pc-linux. Eugene [-- Attachment #2: 0001-Add-a-tree-optimization-described-in-PR97223.patch --] [-- Type: application/octet-stream, Size: 1923 bytes --] From bc5fca4cbafae6b6bbf55787af1d2e5d1538649b Mon Sep 17 00:00:00 2001 From: Eugene Rozenfeld <erozen@microsoft.com> Date: Fri, 23 Oct 2020 16:47:01 -0700 Subject: [PATCH] Add a tree optimization described in PR97223. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Convert x < (short) ((unsigned short)x + const) to x <= SHORT_MAX – const (and similarly for other integral types) if const is not 0. For example, without this patch the x86_64-pc-linux code generated for this function bool f(char x) { return x < (char)(x + 12); } is lea eax,[rdi+0xc] cmp al,dil setg al ret With the patch the code is cmp dil,0x73 setle al ret --- gcc/match.pd | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/gcc/match.pd b/gcc/match.pd index 17ba04100c7..bc5bed626ec 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -4954,6 +4954,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) wi::max_value (prec, UNSIGNED) - wi::to_wide (@1)); }))))) +/* Similar to the previous pattern but with additional casts. */ +(for cmp (lt le ge gt) + out (gt gt le le) + (simplify + (cmp:c (convert@3 (plus@2 (convert@4 @0) INTEGER_CST@1)) @0) + (if (!TYPE_UNSIGNED (TREE_TYPE (@0)) + && types_match (TREE_TYPE (@0), TREE_TYPE (@3)) + && types_match (TREE_TYPE (@4), unsigned_type_for (TREE_TYPE (@0))) + && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@4)) + && wi::to_wide (@1) != 0 + && single_use (@2)) + (with { unsigned int prec = TYPE_PRECISION (TREE_TYPE (@0)); } + (out @0 { wide_int_to_tree (TREE_TYPE (@0), + wi::max_value (prec, SIGNED) + - wi::to_wide (@1)); }))))) + /* To detect overflow in unsigned A - B, A < B is simpler than A - B > A. However, the detection logic for SUB_OVERFLOW in tree-ssa-math-opts.c expects the long form, so we restrict the transformation for now. */ -- 2.17.1 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] [tree-optimization] Fix for PR97223 2020-10-24 0:19 [PATCH] [tree-optimization] Fix for PR97223 Eugene Rozenfeld @ 2020-10-27 9:23 ` Richard Biener 2020-10-29 19:45 ` [EXTERNAL] " Eugene Rozenfeld 0 siblings, 1 reply; 5+ messages in thread From: Richard Biener @ 2020-10-27 9:23 UTC (permalink / raw) To: Eugene Rozenfeld; +Cc: gcc-patches On Sat, Oct 24, 2020 at 2:20 AM Eugene Rozenfeld via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > This patch adds a pattern for folding > x < (short) ((unsigned short)x + const) > to > x <= SHORT_MAX - const > (and similarly for other integral types) if const is not 0. > as described in PR97223. > > For example, without this patch the x86_64-pc-linux code generated for this function > > bool f(char x) > { > return x < (char)(x + 12); > } > > is > > lea eax,[rdi+0xc] > cmp al,dil > setg al > ret > > With the patch the code is > > cmp dil,0x73 > setle al > ret > > Tested on x86_64-pc-linux. +/* Similar to the previous pattern but with additional casts. */ +(for cmp (lt le ge gt) + out (gt gt le le) + (simplify + (cmp:c (convert@3 (plus@2 (convert@4 @0) INTEGER_CST@1)) @0) + (if (!TYPE_UNSIGNED (TREE_TYPE (@0)) + && types_match (TREE_TYPE (@0), TREE_TYPE (@3)) + && types_match (TREE_TYPE (@4), unsigned_type_for (TREE_TYPE (@0))) + && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@4)) + && wi::to_wide (@1) != 0 + && single_use (@2)) + (with { unsigned int prec = TYPE_PRECISION (TREE_TYPE (@0)); } + (out @0 { wide_int_to_tree (TREE_TYPE (@0), + wi::max_value (prec, SIGNED) + - wi::to_wide (@1)); }))))) I think it's reasonable but the comment can be made more precise. In particular I wonder why we require a signed comparison here while the previous pattern requires an unsigned comparison. It might be an artifact and the restriction instead only applies to the plus? Note that + && types_match (TREE_TYPE (@4), unsigned_type_for (TREE_TYPE (@0))) unsigned_type_for should be avoided since it's quite expensive. May I suggest && TYPE_UNSIGNED (TREE_TYPE (@4)) && tree_nop_conversion_p (TREE_TYPE (@4), TREE_TYPE (@0)) instead? I originally wondered if "but with additional casts" could be done in a single pattern via (convert? ...) uses but then I noticed the strange difference in the comparison signedness requirement ... Richard. > Eugene > ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [EXTERNAL] Re: [PATCH] [tree-optimization] Fix for PR97223 2020-10-27 9:23 ` Richard Biener @ 2020-10-29 19:45 ` Eugene Rozenfeld 2020-10-30 8:24 ` Richard Biener 2020-11-06 3:46 ` Jeff Law 0 siblings, 2 replies; 5+ messages in thread From: Eugene Rozenfeld @ 2020-10-29 19:45 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches [-- Attachment #1: Type: text/plain, Size: 3284 bytes --] Thank you for the review Richard! I re-worked the patch based on your suggestions. I combined the two patterns. Neither one requires a signedness check as long as the type of the 'add' has overflow wrap semantics. I had to modify the regular expression in no-strict-overflow-4.c test. In that test the following function is compiled with -fno-strict-overflow : int foo (int i) { return i + 1 > i; } We now optimize this function so that the tree-optimized dump has ;; Function foo (foo, funcdef_no=0, decl_uid=1931, cgraph_uid=1, symbol_order=0) foo (int i) { _Bool _1; int _3; <bb 2> [local count: 1073741824]: _1 = i_2(D) != 2147483647; _3 = (int) _1; return _3; } This is a correct optimization since -fno-strict-overflow implies -fwrapv. Eugene -----Original Message----- From: Richard Biener <richard.guenther@gmail.com> Sent: Tuesday, October 27, 2020 2:23 AM To: Eugene Rozenfeld <Eugene.Rozenfeld@microsoft.com> Cc: gcc-patches@gcc.gnu.org Subject: [EXTERNAL] Re: [PATCH] [tree-optimization] Fix for PR97223 On Sat, Oct 24, 2020 at 2:20 AM Eugene Rozenfeld via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > This patch adds a pattern for folding > x < (short) ((unsigned short)x + const) to > x <= SHORT_MAX - const > (and similarly for other integral types) if const is not 0. > as described in PR97223. > > For example, without this patch the x86_64-pc-linux code generated for > this function > > bool f(char x) > { > return x < (char)(x + 12); > } > > is > > lea eax,[rdi+0xc] > cmp al,dil > setg al > ret > > With the patch the code is > > cmp dil,0x73 > setle al > ret > > Tested on x86_64-pc-linux. +/* Similar to the previous pattern but with additional casts. */ (for +cmp (lt le ge gt) + out (gt gt le le) + (simplify + (cmp:c (convert@3 (plus@2 (convert@4 @0) INTEGER_CST@1)) @0) + (if (!TYPE_UNSIGNED (TREE_TYPE (@0)) + && types_match (TREE_TYPE (@0), TREE_TYPE (@3)) + && types_match (TREE_TYPE (@4), unsigned_type_for (TREE_TYPE (@0))) + && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@4)) + && wi::to_wide (@1) != 0 + && single_use (@2)) + (with { unsigned int prec = TYPE_PRECISION (TREE_TYPE (@0)); } + (out @0 { wide_int_to_tree (TREE_TYPE (@0), + wi::max_value (prec, SIGNED) + - wi::to_wide (@1)); }))))) I think it's reasonable but the comment can be made more precise. In particular I wonder why we require a signed comparison here while the previous pattern requires an unsigned comparison. It might be an artifact and the restriction instead only applies to the plus? Note that + && types_match (TREE_TYPE (@4), unsigned_type_for (TREE_TYPE + (@0))) unsigned_type_for should be avoided since it's quite expensive. May I suggest && TYPE_UNSIGNED (TREE_TYPE (@4)) && tree_nop_conversion_p (TREE_TYPE (@4), TREE_TYPE (@0)) instead? I originally wondered if "but with additional casts" could be done in a single pattern via (convert? ...) uses but then I noticed the strange difference in the comparison signedness requirement ... Richard. > Eugene > [-- Attachment #2: 0001-Add-a-tree-optimization-described-in-PR97223.patch --] [-- Type: application/octet-stream, Size: 3049 bytes --] From 973942122522bbf2e9de54cff17de59de5955547 Mon Sep 17 00:00:00 2001 From: Eugene Rozenfeld <erozen@microsoft.com> Date: Fri, 23 Oct 2020 16:47:01 -0700 Subject: [PATCH] Add a tree optimization described in PR97223. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Convert x < (short) ((unsigned short)x + const) to x <= SHORT_MAX – const (and similarly for other integral types) if const is not 0. For example, without this patch the x86_64-pc-linux code generated for this function bool f(char x) { return x < (char)(x + 12); } is lea eax,[rdi+0xc] cmp al,dil setg al ret With the patch the code is cmp dil,0x73 setle al ret --- gcc/match.pd | 16 ++++++++++------ gcc/testsuite/gcc.dg/no-strict-overflow-4.c | 5 +++-- 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/gcc/match.pd b/gcc/match.pd index 17ba04100c7..412e21faf86 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -4940,18 +4940,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* When one argument is a constant, overflow detection can be simplified. Currently restricted to single use so as not to interfere too much with ADD_OVERFLOW detection in tree-ssa-math-opts.c. - A + CST CMP A -> A CMP' CST' */ + CONVERT?(CONVERT?(A) + CST) CMP A -> A CMP' CST' */ (for cmp (lt le ge gt) out (gt gt le le) (simplify - (cmp:c (plus@2 @0 INTEGER_CST@1) @0) - (if (TYPE_UNSIGNED (TREE_TYPE (@0)) - && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)) + (cmp:c (convert?@3 (plus@2 (convert?@4 @0) INTEGER_CST@1)) @0) + (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@2)) + && types_match (TREE_TYPE (@0), TREE_TYPE (@3)) + && tree_nop_conversion_p (TREE_TYPE (@4), TREE_TYPE (@0)) && wi::to_wide (@1) != 0 && single_use (@2)) - (with { unsigned int prec = TYPE_PRECISION (TREE_TYPE (@0)); } + (with { + unsigned int prec = TYPE_PRECISION (TREE_TYPE (@0)); + signop sign = TYPE_SIGN (TREE_TYPE (@0)); + } (out @0 { wide_int_to_tree (TREE_TYPE (@0), - wi::max_value (prec, UNSIGNED) + wi::max_value (prec, sign) - wi::to_wide (@1)); }))))) /* To detect overflow in unsigned A - B, A < B is simpler than A - B > A. diff --git a/gcc/testsuite/gcc.dg/no-strict-overflow-4.c b/gcc/testsuite/gcc.dg/no-strict-overflow-4.c index b6d3da3f831..90145ff9422 100644 --- a/gcc/testsuite/gcc.dg/no-strict-overflow-4.c +++ b/gcc/testsuite/gcc.dg/no-strict-overflow-4.c @@ -4,7 +4,8 @@ /* Source: Ian Lance Taylor. Dual of strict-overflow-4.c. */ /* We can only simplify the conditional when using strict overflow - semantics. */ + semantics or when using wrap overflow semantics. -fno-strict-overflow is + equivalent to -fwrapv. */ int foo (int i) @@ -12,4 +13,4 @@ foo (int i) return i + 1 > i; } -/* { dg-final { scan-tree-dump "\[^ \]*_.(\\\(D\\\))? (>|<) \[^ \]*_." "optimized" } } */ +/* { dg-final { scan-tree-dump "\[^ \]*_.(\\\(D\\\))? != \[0-9]+" "optimized" } } */ -- 2.17.1 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [EXTERNAL] Re: [PATCH] [tree-optimization] Fix for PR97223 2020-10-29 19:45 ` [EXTERNAL] " Eugene Rozenfeld @ 2020-10-30 8:24 ` Richard Biener 2020-11-06 3:46 ` Jeff Law 1 sibling, 0 replies; 5+ messages in thread From: Richard Biener @ 2020-10-30 8:24 UTC (permalink / raw) To: Eugene Rozenfeld; +Cc: gcc-patches On Thu, Oct 29, 2020 at 8:45 PM Eugene Rozenfeld <Eugene.Rozenfeld@microsoft.com> wrote: > > Thank you for the review Richard! > > I re-worked the patch based on your suggestions. I combined the two patterns. Neither one requires a signedness check as long as the type of the 'add' has overflow wrap semantics. > > I had to modify the regular expression in no-strict-overflow-4.c test. In that test the following function is compiled with -fno-strict-overflow : > > int > foo (int i) > { > return i + 1 > i; > } > > We now optimize this function so that the tree-optimized dump has > > ;; Function foo (foo, funcdef_no=0, decl_uid=1931, cgraph_uid=1, symbol_order=0) > > foo (int i) > { > _Bool _1; > int _3; > > <bb 2> [local count: 1073741824]: > _1 = i_2(D) != 2147483647; > _3 = (int) _1; > return _3; > } > > This is a correct optimization since -fno-strict-overflow implies -fwrapv. OK. Thanks, Richard. > Eugene > > -----Original Message----- > From: Richard Biener <richard.guenther@gmail.com> > Sent: Tuesday, October 27, 2020 2:23 AM > To: Eugene Rozenfeld <Eugene.Rozenfeld@microsoft.com> > Cc: gcc-patches@gcc.gnu.org > Subject: [EXTERNAL] Re: [PATCH] [tree-optimization] Fix for PR97223 > > On Sat, Oct 24, 2020 at 2:20 AM Eugene Rozenfeld via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > > This patch adds a pattern for folding > > x < (short) ((unsigned short)x + const) to > > x <= SHORT_MAX - const > > (and similarly for other integral types) if const is not 0. > > as described in PR97223. > > > > For example, without this patch the x86_64-pc-linux code generated for > > this function > > > > bool f(char x) > > { > > return x < (char)(x + 12); > > } > > > > is > > > > lea eax,[rdi+0xc] > > cmp al,dil > > setg al > > ret > > > > With the patch the code is > > > > cmp dil,0x73 > > setle al > > ret > > > > Tested on x86_64-pc-linux. > > +/* Similar to the previous pattern but with additional casts. */ (for > +cmp (lt le ge gt) > + out (gt gt le le) > + (simplify > + (cmp:c (convert@3 (plus@2 (convert@4 @0) INTEGER_CST@1)) @0) > + (if (!TYPE_UNSIGNED (TREE_TYPE (@0)) > + && types_match (TREE_TYPE (@0), TREE_TYPE (@3)) > + && types_match (TREE_TYPE (@4), unsigned_type_for (TREE_TYPE (@0))) > + && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@4)) > + && wi::to_wide (@1) != 0 > + && single_use (@2)) > + (with { unsigned int prec = TYPE_PRECISION (TREE_TYPE (@0)); } > + (out @0 { wide_int_to_tree (TREE_TYPE (@0), > + wi::max_value (prec, SIGNED) > + - wi::to_wide (@1)); }))))) > > I think it's reasonable but the comment can be made more precise. > In particular I wonder why we require a signed comparison here while the previous pattern requires an unsigned comparison. It might be an artifact and the restriction instead only applies to the plus? > > Note that > > + && types_match (TREE_TYPE (@4), unsigned_type_for (TREE_TYPE > + (@0))) > > unsigned_type_for should be avoided since it's quite expensive. May I suggest > > && TYPE_UNSIGNED (TREE_TYPE (@4)) > && tree_nop_conversion_p (TREE_TYPE (@4), TREE_TYPE (@0)) > > instead? > > I originally wondered if "but with additional casts" could be done in a single pattern via (convert? ...) uses but then I noticed the strange difference in the comparison signedness requirement ... > > Richard. > > > Eugene > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [EXTERNAL] Re: [PATCH] [tree-optimization] Fix for PR97223 2020-10-29 19:45 ` [EXTERNAL] " Eugene Rozenfeld 2020-10-30 8:24 ` Richard Biener @ 2020-11-06 3:46 ` Jeff Law 1 sibling, 0 replies; 5+ messages in thread From: Jeff Law @ 2020-11-06 3:46 UTC (permalink / raw) To: Eugene Rozenfeld, Richard Biener; +Cc: gcc-patches On 10/29/20 1:45 PM, Eugene Rozenfeld via Gcc-patches wrote: > Thank you for the review Richard! > > I re-worked the patch based on your suggestions. I combined the two patterns. Neither one requires a signedness check as long as the type of the 'add' has overflow wrap semantics. > > I had to modify the regular expression in no-strict-overflow-4.c test. In that test the following function is compiled with -fno-strict-overflow : > > int > foo (int i) > { > return i + 1 > i; > } > > We now optimize this function so that the tree-optimized dump has > > ;; Function foo (foo, funcdef_no=0, decl_uid=1931, cgraph_uid=1, symbol_order=0) > > foo (int i) > { > _Bool _1; > int _3; > > <bb 2> [local count: 1073741824]: > _1 = i_2(D) != 2147483647; > _3 = (int) _1; > return _3; > } > > This is a correct optimization since -fno-strict-overflow implies -fwrapv. > > Eugene > > -----Original Message----- > From: Richard Biener <richard.guenther@gmail.com> > Sent: Tuesday, October 27, 2020 2:23 AM > To: Eugene Rozenfeld <Eugene.Rozenfeld@microsoft.com> > Cc: gcc-patches@gcc.gnu.org > Subject: [EXTERNAL] Re: [PATCH] [tree-optimization] Fix for PR97223 > > On Sat, Oct 24, 2020 at 2:20 AM Eugene Rozenfeld via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: >> This patch adds a pattern for folding >> x < (short) ((unsigned short)x + const) to >> x <= SHORT_MAX - const >> (and similarly for other integral types) if const is not 0. >> as described in PR97223. >> >> For example, without this patch the x86_64-pc-linux code generated for >> this function >> >> bool f(char x) >> { >> return x < (char)(x + 12); >> } >> >> is >> >> lea eax,[rdi+0xc] >> cmp al,dil >> setg al >> ret >> >> With the patch the code is >> >> cmp dil,0x73 >> setle al >> ret >> >> Tested on x86_64-pc-linux. > +/* Similar to the previous pattern but with additional casts. */ (for > +cmp (lt le ge gt) > + out (gt gt le le) > + (simplify > + (cmp:c (convert@3 (plus@2 (convert@4 @0) INTEGER_CST@1)) @0) > + (if (!TYPE_UNSIGNED (TREE_TYPE (@0)) > + && types_match (TREE_TYPE (@0), TREE_TYPE (@3)) > + && types_match (TREE_TYPE (@4), unsigned_type_for (TREE_TYPE (@0))) > + && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@4)) > + && wi::to_wide (@1) != 0 > + && single_use (@2)) > + (with { unsigned int prec = TYPE_PRECISION (TREE_TYPE (@0)); } > + (out @0 { wide_int_to_tree (TREE_TYPE (@0), > + wi::max_value (prec, SIGNED) > + - wi::to_wide (@1)); }))))) > > I think it's reasonable but the comment can be made more precise. > In particular I wonder why we require a signed comparison here while the previous pattern requires an unsigned comparison. It might be an artifact and the restriction instead only applies to the plus? > > Note that > > + && types_match (TREE_TYPE (@4), unsigned_type_for (TREE_TYPE > + (@0))) > > unsigned_type_for should be avoided since it's quite expensive. May I suggest > > && TYPE_UNSIGNED (TREE_TYPE (@4)) > && tree_nop_conversion_p (TREE_TYPE (@4), TREE_TYPE (@0)) > > instead? > > I originally wondered if "but with additional casts" could be done in a single pattern via (convert? ...) uses but then I noticed the strange difference in the comparison signedness requirement ... > > Richard. > >> Eugene >> >> >> 0001-Add-a-tree-optimization-described-in-PR97223.patch >> >> From 973942122522bbf2e9de54cff17de59de5955547 Mon Sep 17 00:00:00 2001 >> From: Eugene Rozenfeld <erozen@microsoft.com> >> Date: Fri, 23 Oct 2020 16:47:01 -0700 >> Subject: [PATCH] Add a tree optimization described in PR97223. >> MIME-Version: 1.0 >> Content-Type: text/plain; charset=UTF-8 >> Content-Transfer-Encoding: 8bit >> >> Convert >> x < (short) ((unsigned short)x + const) >> to >> x <= SHORT_MAX – const >> (and similarly for other integral types) if const is not 0. >> >> For example, without this patch the x86_64-pc-linux code generated for this function >> >> bool f(char x) >> { >> return x < (char)(x + 12); >> } >> >> is >> >> lea eax,[rdi+0xc] >> cmp al,dil >> setg al >> ret >> >> With the patch the code is >> >> cmp dil,0x73 >> setle al >> ret >> --- >> gcc/match.pd | 16 ++++++++++------ >> gcc/testsuite/gcc.dg/no-strict-overflow-4.c | 5 +++-- >> 2 files changed, 13 insertions(+), 8 deletions(-) Committed to the trunk. Thanks. jeff ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-11-06 3:47 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-24 0:19 [PATCH] [tree-optimization] Fix for PR97223 Eugene Rozenfeld 2020-10-27 9:23 ` Richard Biener 2020-10-29 19:45 ` [EXTERNAL] " Eugene Rozenfeld 2020-10-30 8:24 ` Richard Biener 2020-11-06 3:46 ` Jeff Law
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).