From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-509162-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 129611 invoked by alias); 18 Sep 2019 07:40:53 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 129156 invoked by uid 89); 18 Sep 2019 07:40:53 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-8.7 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,KAM_NUMSUBJECT,SPF_PASS autolearn=ham version=3.3.1 spammy=H*f:sk:CAFULd4, H*i:sk:CAFULd4, H*i:sk:tmLfYiV, H*f:sk:tmLfYiV
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 18 Sep 2019 07:40:51 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 457B128;	Wed, 18 Sep 2019 00:40:50 -0700 (PDT)
Received: from localhost (unknown [10.32.98.126])	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A87873F59C;	Wed, 18 Sep 2019 00:40:49 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Uros Bizjak <ubizjak@gmail.com>
Mail-Followup-To: Uros Bizjak <ubizjak@gmail.com>,"gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  Jan Hubicka <hubicka@ucw.cz>, richard.sandiford@arm.com
Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,  Jan Hubicka <hubicka@ucw.cz>
Subject: Re: [x86] Tweak testcases for PR82361
References: <mpto8zi3mqs.fsf@arm.com>	<CAFULd4Z4Gg5mWc7uQuSapcsWx5xVMr0XhfNY=tmLfYiVHLy_AA@mail.gmail.com>
Date: Wed, 18 Sep 2019 07:40:00 -0000
In-Reply-To: <CAFULd4Z4Gg5mWc7uQuSapcsWx5xVMr0XhfNY=tmLfYiVHLy_AA@mail.gmail.com>	(Uros Bizjak's message of "Wed, 18 Sep 2019 08:43:49 +0200")
Message-ID: <mptimpq127z.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-IsSubscribed: yes
X-SW-Source: 2019-09/txt/msg01053.txt.bz2

Uros Bizjak <ubizjak@gmail.com> writes:
> On Tue, Sep 17, 2019 at 6:34 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> gcc/testsuite/gcc.target/i386/pr82361-[12].c check whether we
>> can optimise away a 32-to-64-bit zero extension of a 32-bit
>> division or modulus result.  Currently this fails for the modulus
>> part of f1 and f2 in pr82361-1.c:
>>
>> /* FIXME: We are still not able to optimize the modulo in f1/f2, only manage
>>    one.  */
>> /* { dg-final { scan-assembler-times "movl\t%edx" 2 } } */
>>
>> pr82361-2.c instead expects no failures:
>>
>> /* Ditto %edx to %rdx zero extensions.  */
>> /* { dg-final { scan-assembler-not "movl\t%edx, %edx" } } */
>>
>> But we actually get the same zero-extensions for f1 and f2 in pr82361-2.c.
>> The reason they don't trigger a failure is that the RA allocates the
>> asm input for "d" to %rdi rather than %rdx, so we have:
>>
>>         movl    %rdi, %rdx
>>
>> instead of:
>>
>>         movl    %rdx, %rdx
>>
>> For the tests to work as expected, I think they have to force "c" and
>> "d" to be %rax and %rdx respectively.  We then see the same failure in
>> pr82361-2.c as for pr82361-1.c (but doubled, due to the 8-bit division
>> path).
>>
>> Tested on x86_64-linux-gnu.  OK to install?
>>
>> Richard
>>
>>
>> 2019-09-17  Richard Sandiford  <richard.sandiford@arm.com>
>>
>> gcc/testsuite/
>>         * gcc.target/i386/pr82361-1.c (f1, f2, f3, f4, f5, f6): Force
>>         "c" to be in %rax and "d" to be in %rdx.
>>         * gcc.target/i386/pr82361-2.c: Expect 4 instances of "movl\t%edx".
>
> OK, with a comment improvement below.
>
> Thanks,
> Uros.
>
>> Index: gcc/testsuite/gcc.target/i386/pr82361-1.c
>> ===================================================================
>> --- gcc/testsuite/gcc.target/i386/pr82361-1.c   2019-03-08 18:14:39.040959532 +0000
>> +++ gcc/testsuite/gcc.target/i386/pr82361-1.c   2019-09-17 17:32:00.930930762 +0100
>> @@ -11,43 +11,43 @@
>>  void
>>  f1 (unsigned int a, unsigned int b)
>>  {
>> -  unsigned long long c = a / b;
>> -  unsigned long long d = a % b;
>> +  register unsigned long long c asm ("rax") = a / b;
>> +  register unsigned long long d asm ("rdx") = a % b;
>>    asm volatile ("" : : "r" (c), "r" (d));
>>  }
>>
>>  void
>>  f2 (int a, int b)
>>  {
>> -  unsigned long long c = (unsigned int) (a / b);
>> -  unsigned long long d = (unsigned int) (a % b);
>> +  register unsigned long long c asm ("rax") = (unsigned int) (a / b);
>> +  register unsigned long long d asm ("rdx") = (unsigned int) (a % b);
>>    asm volatile ("" : : "r" (c), "r" (d));
>>  }
>>
>>  void
>>  f3 (unsigned int a, unsigned int b)
>>  {
>> -  unsigned long long c = a / b;
>> +  register unsigned long long c asm ("rax") = a / b;
>>    asm volatile ("" : : "r" (c));
>>  }
>>
>>  void
>>  f4 (int a, int b)
>>  {
>> -  unsigned long long c = (unsigned int) (a / b);
>> +  register unsigned long long c asm ("rax") = (unsigned int) (a / b);
>>    asm volatile ("" : : "r" (c));
>>  }
>>
>>  void
>>  f5 (unsigned int a, unsigned int b)
>>  {
>> -  unsigned long long d = a % b;
>> +  register unsigned long long d asm ("rdx") = a % b;
>>    asm volatile ("" : : "r" (d));
>>  }
>>
>>  void
>>  f6 (int a, int b)
>>  {
>> -  unsigned long long d = (unsigned int) (a % b);
>> +  register unsigned long long d asm ("rdx") = (unsigned int) (a % b);
>>    asm volatile ("" : : "r" (d));
>>  }
>> Index: gcc/testsuite/gcc.target/i386/pr82361-2.c
>> ===================================================================
>> --- gcc/testsuite/gcc.target/i386/pr82361-2.c   2019-09-17 16:34:52.280124553 +0100
>> +++ gcc/testsuite/gcc.target/i386/pr82361-2.c   2019-09-17 17:32:00.930930762 +0100
>> @@ -4,7 +4,8 @@
>>  /* We should be able to optimize all %eax to %rax zero extensions, because
>>     div and idiv instructions with 32-bit operands zero-extend both results.   */
>>  /* { dg-final { scan-assembler-not "movl\t%eax, %eax" } } */
>> -/* Ditto %edx to %rdx zero extensions.  */
>> -/* { dg-final { scan-assembler-not "movl\t%edx, %edx" } } */
>> +/* FIXME: We are still not able to optimize the modulo in f1/f2, only manage
>> +   one.  */
>
> Can we please change comment here and in pr82361-2.c to something like:
>
> /* FIXME: The compiler does not merge zero-extension to the modulo part.  */

Thanks, here's what I applied.

Richard


2019-09-18  Richard Sandiford  <richard.sandiford@arm.com>

gcc/testsuite/
	* gcc.target/i386/pr82361-1.c (f1, f2, f3, f4, f5, f6): Force
	"c" to be in %rax and "d" to be in %rdx.
	* gcc.target/i386/pr82361-2.c: Expect 4 instances of "movl\t%edx".

Index: gcc/testsuite/gcc.target/i386/pr82361-1.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr82361-1.c	2019-09-17 18:00:14.000000000 +0100
+++ gcc/testsuite/gcc.target/i386/pr82361-1.c	2019-09-18 08:37:39.030720198 +0100
@@ -4,50 +4,50 @@
 /* We should be able to optimize all %eax to %rax zero extensions, because
    div and idiv instructions with 32-bit operands zero-extend both results.   */
 /* { dg-final { scan-assembler-not "movl\t%eax, %eax" } } */
-/* FIXME: We are still not able to optimize the modulo in f1/f2, only manage
-   one.  */
+/* FIXME: The compiler does not merge zero-extension to the modulo part
+   of f1 and f2.  */
 /* { dg-final { scan-assembler-times "movl\t%edx" 2 } } */
 
 void
 f1 (unsigned int a, unsigned int b)
 {
-  unsigned long long c = a / b;
-  unsigned long long d = a % b;
+  register unsigned long long c asm ("rax") = a / b;
+  register unsigned long long d asm ("rdx") = a % b;
   asm volatile ("" : : "r" (c), "r" (d));
 }
 
 void
 f2 (int a, int b)
 {
-  unsigned long long c = (unsigned int) (a / b);
-  unsigned long long d = (unsigned int) (a % b);
+  register unsigned long long c asm ("rax") = (unsigned int) (a / b);
+  register unsigned long long d asm ("rdx") = (unsigned int) (a % b);
   asm volatile ("" : : "r" (c), "r" (d));
 }
 
 void
 f3 (unsigned int a, unsigned int b)
 {
-  unsigned long long c = a / b;
+  register unsigned long long c asm ("rax") = a / b;
   asm volatile ("" : : "r" (c));
 }
 
 void
 f4 (int a, int b)
 {
-  unsigned long long c = (unsigned int) (a / b);
+  register unsigned long long c asm ("rax") = (unsigned int) (a / b);
   asm volatile ("" : : "r" (c));
 }
 
 void
 f5 (unsigned int a, unsigned int b)
 {
-  unsigned long long d = a % b;
+  register unsigned long long d asm ("rdx") = a % b;
   asm volatile ("" : : "r" (d));
 }
 
 void
 f6 (int a, int b)
 {
-  unsigned long long d = (unsigned int) (a % b);
+  register unsigned long long d asm ("rdx") = (unsigned int) (a % b);
   asm volatile ("" : : "r" (d));
 }
Index: gcc/testsuite/gcc.target/i386/pr82361-2.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr82361-2.c	2019-09-17 18:00:14.000000000 +0100
+++ gcc/testsuite/gcc.target/i386/pr82361-2.c	2019-09-18 08:37:39.034720166 +0100
@@ -4,7 +4,8 @@
 /* We should be able to optimize all %eax to %rax zero extensions, because
    div and idiv instructions with 32-bit operands zero-extend both results.   */
 /* { dg-final { scan-assembler-not "movl\t%eax, %eax" } } */
-/* Ditto %edx to %rdx zero extensions.  */
-/* { dg-final { scan-assembler-not "movl\t%edx, %edx" } } */
+/* FIXME: The compiler does not merge zero-extension to the modulo part
+   of f1 and f2.  */
+/* { dg-final { scan-assembler-times "movl\t%edx" 4 } } */
 
 #include "pr82361-1.c"