Widening multiplication, but no narrowing division [i386/AMD64]

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Widening multiplication, but no narrowing division [i386/AMD64]
@ 2023-01-09 12:20 Stefan Kanthak
  2023-01-09 13:10 ` LIU Hao
  2023-01-09 14:23 ` Paul Koning
  0 siblings, 2 replies; 8+ messages in thread
From: Stefan Kanthak @ 2023-01-09 12:20 UTC (permalink / raw)
  To: gcc

Hi,

GCC (and other C compilers too) support the widening multiplication
of i386/AMD64 processors, but DON'T support their narrowing division:

--- demo.c ---
unsigned long long product(unsigned long multiplicand,
                           unsigned long multiplier)
{
    return (unsigned long long) multiplicand * multiplier;
}

unsigned long long quotient(unsigned long long dividend,
                            unsigned long divisor,
                            unsigned long *remainder)
{
    *remainder = dividend % divisor;
    return dividend / divisor;
}
--- EOF ---

GCC 12.2: gcc -m32 -O2 demo.c

# https://godbolt.org/z/1M9dohMcE
product(unsigned long, unsigned long):
        mov     eax, DWORD PTR [esp+8]
        mul     DWORD PTR [esp+4]
        ret
quotient(unsigned long long, unsigned long, unsigned long*):
        push    ebx
        xor     edx, edx
        sub     esp, 24
        mov     eax, DWORD PTR [esp+40]
        lea     ecx, [esp+8]
        sub     esp, 12
        push    ecx
        push    edx
        push    eax
        push    DWORD PTR [esp+60]
        push    DWORD PTR [esp+60]
        call    __udivmoddi4
        mov     ebx, DWORD PTR [esp+40]
        mov     ecx, DWORD PTR [esp+76]
        mov     DWORD PTR [ecx], ebx
        add     esp, 56
        pop     ebx
        ret

### Diversion ###

Even worse and completely BRAINDEAD, another compiler calls __udivdi3()
and wastes a multiplication to compute the remainder, ignoring the fact
that __udivdi3() calls __udivmoddi4() which already returns quotient
and remainder:

clang 15.0.0: clang -m32 -O2 demo.c

# https://godbolt.org/z/rv1sTe7xv
product(unsigned long, unsigned long):
        mov     eax, dword ptr [esp + 8]
        mul     dword ptr [esp + 4]
        ret
quotient(unsigned long long, unsigned long, unsigned long*):
        push    ebp
        push    ebx
        push    edi
        push    esi
        sub     esp, 12
        call    .L1$pb
.L1$pb:
        pop     ebx
.Ltmp2:
        add     ebx, offset _GLOBAL_OFFSET_TABLE_+(.Ltmp2-.L1$pb)
        mov     esi, dword ptr [esp + 44]
        mov     edi, dword ptr [esp + 32]
        mov     ebp, dword ptr [esp + 40]
        push    0
        push    ebp
        push    dword ptr [esp + 44]
        push    edi
        call    __udivdi3@PLT
        add     esp, 16
        imul    ebp, eax
        sub     edi, ebp
        mov     dword ptr [esi], edi
        add     esp, 12
        pop     esi
        pop     edi
        pop     ebx
        pop     ebp
        ret

### end of diversion ###

Both compilers miss the fact that the i386 processor has a narrowing
integer division and can therefore divide 64-bit / 32-bit numbers,
for example with the well-known "long" alias "schoolbook" division,
returning 64-bit quotient and 32-bit remainder:

.arch   generic 32
.code32
.intel_syntax noprefix
.text

quotient:
        mov     ecx, [esp+12]   # ecx = divisor
.if 0
        xor     edx, edx
        mov     eax, [esp+8]    # edx:eax = high dword of dividend
        cmp     eax, edx
        je      0f              # high dword of dividend = 0?
.else
        xor     eax, eax
        mov     edx, [esp+8]    # eax:edx = high dword of dividend
        cmp     edx, ecx
        jb      0f              # high dword of dividend < divisor?
                                # quotient < 2**32?
        xchg    eax, edx
.endif
        div     ecx             # eax = high dword of quotient,
0:                              # edx = high dword of dividend'
        push    eax
        mov     eax, [esp+8]    # edx:eax = dividend'
        div     ecx             # eax = low dword of quotient,
                                # edx = remainder
        mov     ecx, remainder  # ecx = address of remainder
        mov     [ecx], edx
        pop     edx             # edx:eax = quotient
        ret
.end

JFTR: dependent on the magnitude of the numbers and the processor
      it MIGHT be better to omit comparison and branch: there's a
      trade-öff between the latency of the (un-pipelined) division
      instruction and the latency of the conditional branch due to
      misprediction.

Stefan Kanthak


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Widening multiplication, but no narrowing division [i386/AMD64]
  2023-01-09 12:20 Widening multiplication, but no narrowing division [i386/AMD64] Stefan Kanthak
@ 2023-01-09 13:10 ` LIU Hao
  2023-01-09 13:19   ` Stefan Kanthak
  2023-01-09 14:23 ` Paul Koning
  1 sibling, 1 reply; 8+ messages in thread
From: LIU Hao @ 2023-01-09 13:10 UTC (permalink / raw)
  To: Stefan Kanthak, gcc


[-- Attachment #1.1: Type: text/plain, Size: 874 bytes --]

在 2023/1/9 20:20, Stefan Kanthak 写道:
> Hi,
> 
> GCC (and other C compilers too) support the widening multiplication
> of i386/AMD64 processors, but DON'T support their narrowing division:
> 
> 

QWORD-DWORD division would change the behavior of your program.


Given:

    ```
    uint32_t xdiv(uint64_t x, uint32_t y) { return x / y;  }
    ```

then `xdiv(0x200000002, 2)` should first convert both operands to `uint64_t`, perform the division 
which yields `0x100000001`, then truncate the quotient to 32-bit which gives `1`. The result is exact.


If DIV was used, it would effect an exception:

    ```
    mov edx, 2
    mov eax, edx   # edx:eax = 0x200000002

    mov ecx, edx
    div ecx        # division overflows because the quotient
                   # can't stored into EAX
    ```





-- 
Best regards,
LIU Hao


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Widening multiplication, but no narrowing division [i386/AMD64]
  2023-01-09 13:10 ` LIU Hao
@ 2023-01-09 13:19   ` Stefan Kanthak
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Kanthak @ 2023-01-09 13:19 UTC (permalink / raw)
  To: gcc, LIU Hao

LIU Hao wrote:

>在 2023/1/9 20:20, Stefan Kanthak 写道:
>> Hi,
>>
>> GCC (and other C compilers too) support the widening multiplication
>> of i386/AMD64 processors, but DON'T support their narrowing division:
>>
>>
>
> QWORD-DWORD division would change the behavior of your program.
[...]
> If DIV was used, it would effect an exception:

Guess why I use "schoolbook" division?
Please read the end of my post until you understand the code.

regards
Stefan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Widening multiplication, but no narrowing division [i386/AMD64]
  2023-01-09 12:20 Widening multiplication, but no narrowing division [i386/AMD64] Stefan Kanthak
  2023-01-09 13:10 ` LIU Hao
@ 2023-01-09 14:23 ` Paul Koning
  2023-01-09 15:20   ` Stefan Kanthak
  1 sibling, 1 reply; 8+ messages in thread
From: Paul Koning @ 2023-01-09 14:23 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: gcc



> On Jan 9, 2023, at 7:20 AM, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
> 
> Hi,
> 
> GCC (and other C compilers too) support the widening multiplication
> of i386/AMD64 processors, but DON'T support their narrowing division:

I wonder if this changed in the recent past.  I have a pattern for this type of thing in pdp11.md:

(define_expand "divmodhi4"
  [(parallel
    [(set (subreg:HI (match_dup 1) 0)
	(div:HI (match_operand:SI 1 "register_operand" "0")
		(match_operand:HI 2 "general_operand" "g")))
     (set (subreg:HI (match_dup 1) 2)
	(mod:HI (match_dup 1) (match_dup 2)))])
   (set (match_operand:HI 0 "register_operand" "=r")
        (subreg:HI (match_dup 1) 0))
   (set (match_operand:HI 3 "register_operand" "=r")
        (subreg:HI (match_dup 1) 2))]
  "TARGET_40_PLUS"
  "")

and I'm pretty sure this worked at some point in the past.  

	paul


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Widening multiplication, but no narrowing division [i386/AMD64]
  2023-01-09 14:23 ` Paul Koning
@ 2023-01-09 15:20   ` Stefan Kanthak
  2023-01-09 15:30     ` Paul Koning
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Kanthak @ 2023-01-09 15:20 UTC (permalink / raw)
  To: Paul Koning; +Cc: gcc

"Paul Koning" <paulkoning@comcast.net> wrote:

>> On Jan 9, 2023, at 7:20 AM, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>> 
>> Hi,
>> 
>> GCC (and other C compilers too) support the widening multiplication
>> of i386/AMD64 processors, but DON'T support their narrowing division:
>
> I wonder if this changed in the recent past.
> I have a pattern for this type of thing in pdp11.md:
[...]
> and I'm pretty sure this worked at some point in the past.  

Unfortunately the C standard defines that the smaller operand (of lesser
conversion rank), here divisor, has to undergo a conversion to the "real
common type", i.e. the broader operand (of higher conversion rank), here
dividend. Unless the information about promotion/conversion is handed over
to the code generator it can't apply such patterns -- as demonstrated by
the demo code.

regards
Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Widening multiplication, but no narrowing division [i386/AMD64]
  2023-01-09 15:20   ` Stefan Kanthak
@ 2023-01-09 15:30     ` Paul Koning
  2023-01-09 16:27       ` Stefan Kanthak
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Koning @ 2023-01-09 15:30 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: gcc



> On Jan 9, 2023, at 10:20 AM, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
> 
> "Paul Koning" <paulkoning@comcast.net> wrote:
> 
>>> On Jan 9, 2023, at 7:20 AM, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>>> 
>>> Hi,
>>> 
>>> GCC (and other C compilers too) support the widening multiplication
>>> of i386/AMD64 processors, but DON'T support their narrowing division:
>> 
>> I wonder if this changed in the recent past.
>> I have a pattern for this type of thing in pdp11.md:
> [...]
>> and I'm pretty sure this worked at some point in the past.  
> 
> Unfortunately the C standard defines that the smaller operand (of lesser
> conversion rank), here divisor, has to undergo a conversion to the "real
> common type", i.e. the broader operand (of higher conversion rank), here
> dividend. Unless the information about promotion/conversion is handed over
> to the code generator it can't apply such patterns -- as demonstrated by
> the demo code.
> 
> regards
> Stefan

Yes, I was thinking the same.  But I spent a while on that pattern -- I wanted to support div/mod as a single operation because the machine has that primitive.  And I'm pretty sure I saw it work before I committed that change.  That's why I'm wondering if something changed.

	paul


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Widening multiplication, but no narrowing division [i386/AMD64]
  2023-01-09 15:30     ` Paul Koning
@ 2023-01-09 16:27       ` Stefan Kanthak
  2023-01-10 16:49         ` Paul Koning
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Kanthak @ 2023-01-09 16:27 UTC (permalink / raw)
  To: Paul Koning; +Cc: gcc

"Paul Koning" <paulkoning@comcast.net> wrote:

>> On Jan 9, 2023, at 10:20 AM, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>> 
>> "Paul Koning" <paulkoning@comcast.net> wrote:
>> 
>>>> On Jan 9, 2023, at 7:20 AM, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> GCC (and other C compilers too) support the widening multiplication
>>>> of i386/AMD64 processors, but DON'T support their narrowing division:
>>> 
>>> I wonder if this changed in the recent past.
>>> I have a pattern for this type of thing in pdp11.md:
>> [...]
>>> and I'm pretty sure this worked at some point in the past.  
>> 
>> Unfortunately the C standard defines that the smaller operand (of lesser
>> conversion rank), here divisor, has to undergo a conversion to the "real
>> common type", i.e. the broader operand (of higher conversion rank), here
>> dividend. Unless the information about promotion/conversion is handed over
>> to the code generator it can't apply such patterns -- as demonstrated by
>> the demo code.

> Yes, I was thinking the same.  But I spent a while on that pattern -- I
> wanted to support div/mod as a single operation because the machine has
> that primitive.  And I'm pretty sure I saw it work before I committed
> that change.  That's why I'm wondering if something changed.

I can't tell from the past how GCC once worked, but today it can't
(or doesn't) use such patterns, at least not on i386/AMD64 processors.
To give another example where the necessary information is most
obviously NOT propagated from front end to back end:

--- clmul.c ---
// widening carry-less multiplication

unsigned long long clmul(unsigned long p, unsigned long q)
{
    unsigned long long r = 0;
    unsigned long      s = 1UL << 31;

    do {
        r <<= 1;
        if (q & s)
#ifdef _MSC_VER
            (unsigned long) r ^= p;
#else
            r ^= p; // no need to promote/convert p here!
#endif
    } while (s >>= 1);

    return r;
}
--- EOF ---

# https://gcc.godbolt.org/z/E99v7fEP3
clmul(unsigned long, unsigned long):
        push    ebp
        mov     ecx, -2147483648
        xor     eax, eax
        xor     edx, edx
        push    edi        # OOPS: superfluous
        xor     edi, edi   # OOPS: superfluous
        push    esi
        push    ebx        # OUCH: WTF?
        mov     ebp, DWORD PTR [esp+24]
        mov     ebx, 32    # OUCH: WTF?
        mov     esi, DWORD PTR [esp+20]
.L3:
        shld    edx, eax, 1
        add     eax, eax
        test    ebp, ecx
        je      .L2
        xor     eax, esi
        xor     edx, edi   # OOPS: superfluous
.L2:
        shr     ecx, 1
        sub     ebx, 1     # OUCH: WTF?
        jne     .L3
        pop     ebx        # OUCH: WTF?
        pop     esi
        pop     edi        # OOPS: superfluous
        pop     ebp
        ret

8 superfluous instructions out of the total 25 instructions!

NOT AMUSED
Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Widening multiplication, but no narrowing division [i386/AMD64]
  2023-01-09 16:27       ` Stefan Kanthak
@ 2023-01-10 16:49         ` Paul Koning
  0 siblings, 0 replies; 8+ messages in thread
From: Paul Koning @ 2023-01-10 16:49 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: gcc

> On Jan 9, 2023, at 11:27 AM, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
> 
> "Paul Koning" <paulkoning@comcast.net> wrote:
> 
>>> ...
> 
>> Yes, I was thinking the same.  But I spent a while on that pattern -- I
>> wanted to support div/mod as a single operation because the machine has
>> that primitive.  And I'm pretty sure I saw it work before I committed
>> that change.  That's why I'm wondering if something changed.
> 
> I can't tell from the past how GCC once worked, but today it can't
> (or doesn't) use such patterns, at least not on i386/AMD64 processors.

It turns out I was confused by the RTL generated by my pattern.  That pattern is for divmodhi, so it works as desired given same-size inputs.  

I'm wondering if the case of longer dividend -- which is a common thing for several machines -- could be handled by a define_peephole2 that matches the sign-extend of the divisor followed by the (longer) divide.  I made a stab at that but what I wrote wasn't valid.

So, question to the list:  suppose I want to write RTL that matches what Stefan is talking about, with a div or mod or divmod that has si results and a di dividend (or hi results and an si dividend), how would you do that?  Can a define_peephole2 do it, and if so, what would it look like?

	paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-01-10 16:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-09 12:20 Widening multiplication, but no narrowing division [i386/AMD64] Stefan Kanthak
2023-01-09 13:10 ` LIU Hao
2023-01-09 13:19   ` Stefan Kanthak
2023-01-09 14:23 ` Paul Koning
2023-01-09 15:20   ` Stefan Kanthak
2023-01-09 15:30     ` Paul Koning
2023-01-09 16:27       ` Stefan Kanthak
2023-01-10 16:49         ` Paul Koning

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).