public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/50856] New: ARM: suboptimal code for absolute difference calculation
@ 2011-10-24 14:53 siarhei.siamashka at gmail dot com
  2011-10-24 17:13 ` [Bug target/50856] " rearnsha at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: siarhei.siamashka at gmail dot com @ 2011-10-24 14:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50856

             Bug #: 50856
           Summary: ARM: suboptimal code for absolute difference
                    calculation
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: siarhei.siamashka@gmail.com


gcc generates suboptimal code on ARM for "abs(a - b)" type of operation, which
is used for example in paeth png filter: http://www.w3.org/TR/PNG-Filters.html

Given the following test code:


int absolute_difference1(unsigned char a, unsigned char b)
{
    return a > b ? a - b : b - a;
}

int absolute_difference2(unsigned char a, unsigned char b)
{
    int tmp = a;
    if ((tmp -= b) < 0)
        tmp = -tmp;
    return tmp;
}


The current gcc svn trunk (r180383) generates the following code for -O2 and
-Os optimizations:

        .cpu arm10tdmi
        .eabi_attribute 27, 3
        .eabi_attribute 28, 1
        .fpu vfp
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 2
        .eabi_attribute 30, 4
        .eabi_attribute 34, 0
        .eabi_attribute 18, 4
        .file   "test.c"
        .text
        .align  2
        .global absolute_difference1
        .type   absolute_difference1, %function
absolute_difference1:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        cmp     r0, r1
        rsbhi   r0, r1, r0
        rsbls   r0, r0, r1
        bx      lr
        .size   absolute_difference1, .-absolute_difference1
        .align  2
        .global absolute_difference2
        .type   absolute_difference2, %function
absolute_difference2:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        rsb     r0, r1, r0
        cmp     r0, #0
        rsblt   r0, r0, #0
        bx      lr
        .size   absolute_difference2, .-absolute_difference2
        .ident  "GCC: (GNU) 4.7.0 20111024 (experimental)"
        .section        .note.GNU-stack,"",%progbits

Even in the quite explicit second code variant ('absolute_difference2'
function), gcc does not generate the expected SUBS + NEGLT pair of
instructions. Also for ARMv6 capable processors even a single USAD8 instruction
could be used here if both operands are known to have values in [0-255] range
and if high latency of this instruction can be hidden.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/50856] ARM: suboptimal code for absolute difference calculation
  2011-10-24 14:53 [Bug target/50856] New: ARM: suboptimal code for absolute difference calculation siarhei.siamashka at gmail dot com
@ 2011-10-24 17:13 ` rearnsha at gcc dot gnu.org
  2022-02-03  4:44 ` [Bug tree-optimization/50856] " pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2011-10-24 17:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50856

Richard Earnshaw <rearnsha at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |arm
           Priority|P3                          |P4
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2011-10-24
     Ever Confirmed|0                           |1

--- Comment #1 from Richard Earnshaw <rearnsha at gcc dot gnu.org> 2011-10-24 17:13:14 UTC ---
Confirmed.  With the second case we get quite close to getting it right (having
detected that there is an ABS expression, but we fail to merge it with the
original subtract.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/50856] ARM: suboptimal code for absolute difference calculation
  2011-10-24 14:53 [Bug target/50856] New: ARM: suboptimal code for absolute difference calculation siarhei.siamashka at gmail dot com
  2011-10-24 17:13 ` [Bug target/50856] " rearnsha at gcc dot gnu.org
@ 2022-02-03  4:44 ` pinskia at gcc dot gnu.org
  2023-10-19 23:42 ` pinskia at gcc dot gnu.org
  2023-10-20  5:15 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-03  4:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50856

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|target                      |tree-optimization
           Assignee|unassigned at gcc dot gnu.org      |pinskia at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is a generic issue and not really arm related.
In phiopt4 for absolute_difference1 we have:
  _14 = (int) b_7(D);
  _12 = (int) a_6(D);
  if (a_6(D) > b_7(D))
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870913]:
  iftmp.0_9 = _12 - _14;
  goto <bb 5>; [100.00%]

  <bb 4> [local count: 536870913]:
  iftmp.0_8 = _14 - _12;

  <bb 5> [local count: 1073741824]:
  # iftmp.0_5 = PHI <iftmp.0_9(3), iftmp.0_8(4)>



While for absolute_difference2 we have in phiopt1
  tmp_4 = (int) a_3(D);
  _1 = (int) b_5(D);
  tmp_6 = tmp_4 - _1;
  if (tmp_6 < 0)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  tmp_8 = _1 - tmp_4;

  <bb 4> :
  # tmp_2 = PHI <tmp_6(2), tmp_8(3)>

absolute_difference2 is easier to handle, let me try to handle both cases/
The second case we have:

(a - b) < 0 ? (b - a) : (a - b)
Or:
(cond (lt (minus@0 @1 @2) zero_p) (minus @2 @1) @0)


The first case we have is more complex.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/50856] ARM: suboptimal code for absolute difference calculation
  2011-10-24 14:53 [Bug target/50856] New: ARM: suboptimal code for absolute difference calculation siarhei.siamashka at gmail dot com
  2011-10-24 17:13 ` [Bug target/50856] " rearnsha at gcc dot gnu.org
  2022-02-03  4:44 ` [Bug tree-optimization/50856] " pinskia at gcc dot gnu.org
@ 2023-10-19 23:42 ` pinskia at gcc dot gnu.org
  2023-10-20  5:15 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-19 23:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50856

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The second case will be solved by updating the patch at:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574892.html

For the review at
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574948.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/50856] ARM: suboptimal code for absolute difference calculation
  2011-10-24 14:53 [Bug target/50856] New: ARM: suboptimal code for absolute difference calculation siarhei.siamashka at gmail dot com
                   ` (2 preceding siblings ...)
  2023-10-19 23:42 ` pinskia at gcc dot gnu.org
@ 2023-10-20  5:15 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-20  5:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50856

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Here is a full testcase (f3 is caught via fold_cond_expr_with_comparison):
```
int f(int a, int b)
{
  int t = a - b;
  if (t > 0) return t;
  return b - a;
}
int f1(int a, int b)
{
  if (a > b)  return a - b;
  return b - a;
}

int f2(int a, int b)
{
 return (a > b) ? a - b : b - a;
}

int f3(int a, int b)
{
 return (a - b) > 0 ? a - b : b - a;
}
```

I should note we currently have the following note in match.pd about not
folding `(a - b) > 0` because of catching f3:
/* Transform comparisons of the form X - Y CMP 0 to X CMP Y.
   ??? The transformation is valid for the other operators if overflow
   is undefined for the type, but performing it here badly interacts
   with the transformation in fold_cond_expr_with_comparison which
   attempts to synthetize ABS_EXPR.  */

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-10-20  5:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-24 14:53 [Bug target/50856] New: ARM: suboptimal code for absolute difference calculation siarhei.siamashka at gmail dot com
2011-10-24 17:13 ` [Bug target/50856] " rearnsha at gcc dot gnu.org
2022-02-03  4:44 ` [Bug tree-optimization/50856] " pinskia at gcc dot gnu.org
2023-10-19 23:42 ` pinskia at gcc dot gnu.org
2023-10-20  5:15 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).