public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/97950] New: Unoptimal code generation with __builtin_*_overflow{,_p} for short and __int128
@ 2020-11-23 13:49 denis.campredon at gmail dot com
  2020-11-23 16:35 ` [Bug tree-optimization/97950] " jakub at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: denis.campredon at gmail dot com @ 2020-11-23 13:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97950

            Bug ID: 97950
           Summary: Unoptimal code generation with
                    __builtin_*_overflow{,_p} for short and __int128
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: denis.campredon at gmail dot com
  Target Milestone: ---

For the following code, the generation is unoptimal on x86-64.

For most of the functions with `short` a jump is generated.
For the functions with `__int128` all but `mul_overflow_{,un}signed___int128`
seems to have extra `mov` produced.

The same problems apply to all the `__builtin_*_overflow_p`
-------------
#define TEST_OVERFLOW(type) \
bool mul_overflow_signed_##type(signed type a, signed type b, signed type c)
{return __builtin_mul_overflow(a,b,&c);} \
bool add_overflow_signed_##type(signed type a, signed type b, signed type c)
{return __builtin_add_overflow(a,b,&c);} \
bool sub_overflow_signed_##type(signed type a, signed type b, signed type c)
{return __builtin_sub_overflow(a,b,&c);} \
bool mul_overflow_unsigned_##type(unsigned type a, unsigned type b, unsigned
type c) {return __builtin_mul_overflow(a,b,&c);} \
bool add_overflow_unsigned_##type(unsigned type a, unsigned type b, unsigned
type c) {return __builtin_add_overflow(a,b,&c);} \
bool sub_overflow_unsigned_##type(unsigned type a, unsigned type b, unsigned
type c) {return __builtin_sub_overflow(a,b,&c);} \

TEST_OVERFLOW(short)
TEST_OVERFLOW(__int128)
-------------

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/97950] Unoptimal code generation with __builtin_*_overflow{,_p} for short and __int128
  2020-11-23 13:49 [Bug tree-optimization/97950] New: Unoptimal code generation with __builtin_*_overflow{,_p} for short and __int128 denis.campredon at gmail dot com
@ 2020-11-23 16:35 ` jakub at gcc dot gnu.org
  2020-11-23 18:01 ` denis.campredon at gmail dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-11-23 16:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97950

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 49613
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49613&action=edit
gcc11-pr97950.patch

Untested fix for the short cases.
As for __int128, I think it would be better if you filed a separate issue,
because it has nothing to do with the short one, and mentioned which moves you
find redundant.  Generally, some redundant moves will be there with double-word
arithmetics, it is a matter of when exactly it is the right time to lower
double-word operations (and which) into word ones, doing it too early prevents
e.g. STV from doing its job and e.g. doing for __int128 some operations in SSE
registers, while doing it too late may result in the already assigning pairs of
GP registers for the 128-bit pseudos and while some useless moves can be
recovered afterwards, certainly not all of them.  x86 doesn't really have
instructions that would allow implementing the 128-bit multiplications with
overflow efficiently.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/97950] Unoptimal code generation with __builtin_*_overflow{,_p} for short and __int128
  2020-11-23 13:49 [Bug tree-optimization/97950] New: Unoptimal code generation with __builtin_*_overflow{,_p} for short and __int128 denis.campredon at gmail dot com
  2020-11-23 16:35 ` [Bug tree-optimization/97950] " jakub at gcc dot gnu.org
@ 2020-11-23 18:01 ` denis.campredon at gmail dot com
  2020-11-23 18:25 ` [Bug target/97950] " pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: denis.campredon at gmail dot com @ 2020-11-23 18:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97950

--- Comment #2 from denis.campredon at gmail dot com ---
Thanks for your fast patch. I've opened PR97961 for the __int128 problem

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/97950] Unoptimal code generation with __builtin_*_overflow{,_p} for short and __int128
  2020-11-23 13:49 [Bug tree-optimization/97950] New: Unoptimal code generation with __builtin_*_overflow{,_p} for short and __int128 denis.campredon at gmail dot com
  2020-11-23 16:35 ` [Bug tree-optimization/97950] " jakub at gcc dot gnu.org
  2020-11-23 18:01 ` denis.campredon at gmail dot com
@ 2020-11-23 18:25 ` pinskia at gcc dot gnu.org
  2020-11-23 20:26 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2020-11-23 18:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97950

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/97950] Unoptimal code generation with __builtin_*_overflow{,_p} for short and __int128
  2020-11-23 13:49 [Bug tree-optimization/97950] New: Unoptimal code generation with __builtin_*_overflow{,_p} for short and __int128 denis.campredon at gmail dot com
                   ` (2 preceding siblings ...)
  2020-11-23 18:25 ` [Bug target/97950] " pinskia at gcc dot gnu.org
@ 2020-11-23 20:26 ` ubizjak at gmail dot com
  2020-11-23 20:36 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: ubizjak at gmail dot com @ 2020-11-23 20:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97950

--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
Comment on attachment 49613
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49613
gcc11-pr97950.patch

>+(define_insn_and_split "*setcc_hi_1"
>+  [(set (match_operand:HI 0 "register_operand" "=q")
>+	(match_operator:HI 1 "ix86_comparison_operator"
>+	  [(reg FLAGS_REG) (const_int 0)]))]
>+  "!TARGET_PARTIAL_REG_STALL"
>+  "#"
>+  "&& reload_completed"
>+  [(set (match_dup 2) (match_dup 1))
>+   (set (match_dup 0) (zero_extend:DI (match_dup 2)))]

zero_extend:HI

Perhaps you could apply SWI24 mode iterator to existing *setcc_si_1_and and
*setcc_si_1_movzbl?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/97950] Unoptimal code generation with __builtin_*_overflow{,_p} for short and __int128
  2020-11-23 13:49 [Bug tree-optimization/97950] New: Unoptimal code generation with __builtin_*_overflow{,_p} for short and __int128 denis.campredon at gmail dot com
                   ` (3 preceding siblings ...)
  2020-11-23 20:26 ` ubizjak at gmail dot com
@ 2020-11-23 20:36 ` jakub at gcc dot gnu.org
  2020-11-24  9:46 ` cvs-commit at gcc dot gnu.org
  2020-11-24  9:51 ` [Bug target/97950] Unoptimal code generation with __builtin_*_overflow{,_p} for short jakub at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-11-23 20:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97950

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #49613|0                           |1
        is obsolete|                            |
           Assignee|unassigned at gcc dot gnu.org      |jakub at gcc dot gnu.org
             Status|UNCONFIRMED                 |ASSIGNED
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2020-11-23

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 49615
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49615&action=edit
gcc11-pr97950.patch

Updated patch.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/97950] Unoptimal code generation with __builtin_*_overflow{,_p} for short and __int128
  2020-11-23 13:49 [Bug tree-optimization/97950] New: Unoptimal code generation with __builtin_*_overflow{,_p} for short and __int128 denis.campredon at gmail dot com
                   ` (4 preceding siblings ...)
  2020-11-23 20:36 ` jakub at gcc dot gnu.org
@ 2020-11-24  9:46 ` cvs-commit at gcc dot gnu.org
  2020-11-24  9:51 ` [Bug target/97950] Unoptimal code generation with __builtin_*_overflow{,_p} for short jakub at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-11-24  9:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97950

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:a1dd66b108cba086f58448ccbe9bf57b0a342f9a

commit r11-5279-ga1dd66b108cba086f58448ccbe9bf57b0a342f9a
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Nov 24 10:45:40 2020 +0100

    i386: Add *setcc_hi_1* define_insn_and_split [PR97950]

    As the following testcase shows, unlike char, int or long long sized
    __builtin_*_overflow{,_p}, for short sized one in most cases the ce1 pass
    doesn't optimize the jo/jno or jc/jnc jumps with setting of a pseudo to 0/1
    into seto/setc.  The reason is missing *setcc_hi_1* pattern.  The following
    patch implements it using mode iterators so that on i486 and pentium?
    one can get the zero extension through and instead of movzbw.

    2020-11-24  Jakub Jelinek  <jakub@redhat.com>

            PR target/97950
            * config/i386/i386.md (*setcc_si_1_and): Macroize into...
            (*setcc_<mode>_1_and): New define_insn_and_split with SWI24
iterator.
            (*setcc_si_1_movzbl): Macroize into...
            (*setcc_<mode>_1_movzbl): New define_insn_and_split with SWI24
            iterator.

            * gcc.target/i386/pr97950.c: New test.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/97950] Unoptimal code generation with __builtin_*_overflow{,_p} for short
  2020-11-23 13:49 [Bug tree-optimization/97950] New: Unoptimal code generation with __builtin_*_overflow{,_p} for short and __int128 denis.campredon at gmail dot com
                   ` (5 preceding siblings ...)
  2020-11-24  9:46 ` cvs-commit at gcc dot gnu.org
@ 2020-11-24  9:51 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-11-24  9:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97950

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Unoptimal code generation   |Unoptimal code generation
                   |with                        |with
                   |__builtin_*_overflow{,_p}   |__builtin_*_overflow{,_p}
                   |for short and __int128      |for short
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed for 11.1, too risky for backports.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-24  9:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-23 13:49 [Bug tree-optimization/97950] New: Unoptimal code generation with __builtin_*_overflow{,_p} for short and __int128 denis.campredon at gmail dot com
2020-11-23 16:35 ` [Bug tree-optimization/97950] " jakub at gcc dot gnu.org
2020-11-23 18:01 ` denis.campredon at gmail dot com
2020-11-23 18:25 ` [Bug target/97950] " pinskia at gcc dot gnu.org
2020-11-23 20:26 ` ubizjak at gmail dot com
2020-11-23 20:36 ` jakub at gcc dot gnu.org
2020-11-24  9:46 ` cvs-commit at gcc dot gnu.org
2020-11-24  9:51 ` [Bug target/97950] Unoptimal code generation with __builtin_*_overflow{,_p} for short jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).