From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-429186-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 129825 invoked by alias); 7 Jun 2016 08:37:26 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 128155 invoked by uid 89); 7 Jun 2016 08:37:25 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=
X-HELO: mail-lf0-f66.google.com
Received: from mail-lf0-f66.google.com (HELO mail-lf0-f66.google.com) (209.85.215.66) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 07 Jun 2016 08:37:15 +0000
Received: by mail-lf0-f66.google.com with SMTP id u74so421731lff.0        for <gcc-patches@gcc.gnu.org>; Tue, 07 Jun 2016 01:37:14 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:mime-version:in-reply-to:references:from:date         :message-id:subject:to:cc:content-transfer-encoding;        bh=iWor0FdHtpM7CLsxkUQdBkAREd46Y0kXCsI4SVmptec=;        b=lD6F5xNLhDL8JPmOTzwZV7pHNQy7Ga0epNTWhNEs4zvDrfGk4xLRNEqffCT9xbeXJn         osmonyo7vZmebdIQTu0gN9ZkOqWmfmty79eTPL9PVpsMb9a84YE9em4/gFJwCQQd0Pam         Z+XyLdNZHSwl7TXmG/Xhf1UnnvOG7DUfoNCzpkcw3+A54hKSrOSA/qDKHuCSvJM39a4Z         qGBEwMPgdsYgqzU3zN7KH9SY7DJn0OJKnY7tgn2HGnsMU4vJlMUhYfvTq2nEQWFdmHzo         XPWvTKqmMMpdV3idxbYIsl4eWQ0ufD868la/rqx5FSu7QMbypNBRsf4gzqumIqmjTjrG         ggeg==
X-Gm-Message-State: ALyK8tI1fYaYaLF/wSTWadrDdSqjq4GJAH876j0rH91fc2RQ2DfSFeI9cd67cGcZqtwQaokrCSJq8ldEKDFsLw==
X-Received: by 10.46.33.214 with SMTP id h83mr5582980lji.41.1465288631964; Tue, 07 Jun 2016 01:37:11 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.114.25.200 with HTTP; Tue, 7 Jun 2016 01:37:11 -0700 (PDT)
In-Reply-To: <CAAgBjMkh=0tyzaMjqpSQg_mV8GYH-Tx9WV5xjHs2k0H0=UeqtQ@mail.gmail.com>
References: <CAAgBjMk0Hdask2JU8xs4fj_Ai1e0ggxB+h3ayb=NOGQBYJ8ccQ@mail.gmail.com> <55BB4127.5050202@foss.arm.com> <CAAgBjMnteaOO=JFifLD1CG7s-JE5r=65wcQm+HBwSMtpwg3ReA@mail.gmail.com> <CAJA7tRaf8sZHmzHCt0n8rGJKt4bQycfSU+ABmb8rbcKb1VFQ=A@mail.gmail.com> <CAAgBjM=X8rXnDnNV4-sWC_0fCt9hcvuOx2-hJ=vwsAb1oBq22Q@mail.gmail.com> <CAAgBjMkh=0tyzaMjqpSQg_mV8GYH-Tx9WV5xjHs2k0H0=UeqtQ@mail.gmail.com>
From: Ramana Radhakrishnan <ramana.gcc@googlemail.com>
Date: Tue, 07 Jun 2016 08:37:00 -0000
Message-ID: <CAJA7tRZebyUSrU1i9ecH=6O5n9imHiQ4SaTAiur=bPvTgswR2A@mail.gmail.com>
Subject: Re: [ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Cc: Ramana Radhakrishnan <ramana.radhakrishnan@foss.arm.com>, 	gcc Patches <gcc-patches@gcc.gnu.org>, Charles Baylis <charles.baylis@linaro.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2016-06/txt/msg00459.txt.bz2

>> Please find the updated patch attached.
>> It passes testsuite for arm-none-linux-gnueabi, arm-none-linux-gnueabihf=
 and
>> arm-none-eabi.
>> However the test-case added in the patch (neon-vect-div-1.c) fails to
>> get vectorized at -O2
>> for armeb-none-linux-gnueabihf.
>> Charles suggested me to try with -O3, which worked.
>> It appears the test-case fails to get vectorized with
>> -fvect-cost-model=3Dcheap (which is default enabled at -O2)
>> and passes for -fno-vect-cost-model / -fvect-cost-model=3Ddynamic
>>
>> I can't figure out why it fails -fvect-cost-model=3Dcheap.
>> From the vect dump (attached):
>> neon-vect-div-1.c:12:3: note: Setting misalignment to -1.
>> neon-vect-div-1.c:12:3: note: not vectorized: unsupported unaligned load=
.*_9
> Hi,
> I think I have some idea why the test-case fails attached with patch
> fail to get vectorized on armeb with -O2.
>
> Issue with big endian vectorizer:
> The patch does not cause regressions on big endian vectorizer but
> fails to vectorize the test-cases attached with the patch, while they
> get vectorized on
> litttle-endian.
> Fails with armeb with the following message in dump:
> note: not vectorized: unsupported unaligned load.*_9
>
> The behavior of big and little endian vectorizer seems to be different
> in arm_builtin_support_vector_misalignment() which overrides the hook
> targetm.vectorize.support_vector_misalignment().
>
> targetm.vectorize.support_vector_misalignment is called by
> vect_supportable_dr_alignment () which in turn is called
> by verify_data_refs_alignment ().
>
> Execution upto following condition is common between arm and armeb
> in vect_supportable_dr_alignment():
>
> if ((TYPE_USER_ALIGN (type) && !is_packed)
>       || targetm.vectorize.support_vector_misalignment (mode, type,
>                                             DR_MISALIGNMENT (dr), is_pack=
ed))
>         /* Can't software pipeline the loads, but can at least do them.  =
*/
>         return dr_unaligned_supported;
>
> For little endian case:
> arm_builtin_support_vector_misalignment() is called with
> V2SF mode and misalignment =3D=3D -1, and the following condition
> becomes true:
> /* If the misalignment is unknown, we should be able to handle the access
>          so long as it is not to a member of a packed data structure.  */
>   if (misalignment =3D=3D -1)
>     return true;
>
> Since the hook returned true we enter the condition above in
> vect_supportable_dr_alignment() and return dr_unaligned_supported;
>
> For big-endian:
> arm_builtin_support_vector_misalignment() is called with V2SF mode.
> The following condition that gates the entire function body fails:
>  if (TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access)
> and the default hook gets called with V2SF mode and the default hook
> returns false because
> movmisalign_optab does not exist for V2SF mode.
>
> So the condition above in vect_supportable_dr_alignment() fails
> and we come here:
>  /* Unsupported.  */
> return dr_unaligned_unsupported;
>
> And hence we get the unaligned load not supported message in the dump
> for armeb in verify_data_ref_alignment ():
>
> static bool
> verify_data_ref_alignment (data_reference_p dr)
> {
>   enum dr_alignment_support supportable_dr_alignment
>     =3D vect_supportable_dr_alignment (dr, false);
>   if (!supportable_dr_alignment)
>     {
>       if (dump_enabled_p ())
>         {
>           if (DR_IS_READ (dr))
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                              "not vectorized: unsupported unaligned load.=
");
>
> With -O3, the test-cases vectorize for armeb, because loop peeling for al=
ignment
> is turned on.
> The above behavior is also reproducible with test-case which is
> irrelevant to the patch.
> for instance, we get the same unsupported unaligned load for following
> test-case (replaced / with +)
>
> void
> foo (int len, float * __restrict p, float *__restrict x)
> {
>   len =3D len & ~31;
>   for (int i =3D 0; i < len; i++)
>     p[i] =3D p[i] + x[i];
> }
> Is the patch OK to commit after bootstrap+test ?


Thanks for the analysis - all the test needs is an additional marker
to skip it on armeb (is there a helper for misaligned loads from the
vectorizer ? ) - Ah probably vect_hw_misalign is sufficient for your
usecase and you want to appropriately fix it for little endian arm
with neon support enabled.

=46rom the patch.

>>+   && flag_unsafe_math_optimizations && flag_reciprocal_math"

Why do we need flag_unsafe_math_optimizations && flag_reciprocal_math
? flag_unsafe_math_optimizations should be sufficient since it enables
flag_reciprocal_math - the reason for flag_unsafe_math_optimizations
is to prevent loss of precision and the fact that on neon denormalized
numbers are flushed to zero.

Ok with that change and a quick test with vect_hw_misalign added to
your testcase.

Sorry about the delay in reviewing.

Ramana


>
> Thanks,
> Prathamesh
>>
>> Thanks,
>> Prathamesh
>>>
>>> Thanks,
>>> Ramana
>>>
>>>>
>>>> Thanks,
>>>> Prathamesh
>>>>>
>>>>>
>>>>> moving on to the patches.
>>>>>
>>>>>> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
>>>>>> index 654d9d5..28c2e2a 100644
>>>>>> --- a/gcc/config/arm/neon.md
>>>>>> +++ b/gcc/config/arm/neon.md
>>>>>> @@ -548,6 +548,32 @@
>>>>>>                      (const_string "neon_mul_<V_elem_ch><q>")))]
>>>>>>  )
>>>>>>
>>>>>
>>>>> Please add a comment here.
>>>>>
>>>>>> +(define_expand "div<mode>3"
>>>>>> +  [(set (match_operand:VCVTF 0 "s_register_operand" "=3Dw")
>>>>>> +        (div:VCVTF (match_operand:VCVTF 1 "s_register_operand" "w")
>>>>>> +               (match_operand:VCVTF 2 "s_register_operand" "w")))]
>>>>>
>>>>> I want to double check that this doesn't collide with Alan's patches =
for FP16 especially if he reuses the VCVTF iterator for all the vcvt f16 ca=
ses.
>>>>>
>>>>>> +  "TARGET_NEON && flag_unsafe_math_optimizations && flag_reciprocal=
_math"
>>>>>> +  {
>>>>>> +    rtx rec =3D gen_reg_rtx (<MODE>mode);
>>>>>> +    rtx vrecps_temp =3D gen_reg_rtx (<MODE>mode);
>>>>>> +
>>>>>> +    /* Reciprocal estimate */
>>>>>> +    emit_insn (gen_neon_vrecpe<mode> (rec, operands[2]));
>>>>>> +
>>>>>> +    /* Perform 2 iterations of Newton-Raphson method for better acc=
uracy */
>>>>>> +    for (int i =3D 0; i < 2; i++)
>>>>>> +      {
>>>>>> +     emit_insn (gen_neon_vrecps<mode> (vrecps_temp, rec, operands[2=
]));
>>>>>> +     emit_insn (gen_mul<mode>3 (rec, rec, vrecps_temp));
>>>>>> +      }
>>>>>> +
>>>>>> +    /* We now have reciprocal in rec, perform operands[0] =3D opera=
nds[1] * rec */
>>>>>> +    emit_insn (gen_mul<mode>3 (operands[0], operands[1], rec));
>>>>>> +    DONE;
>>>>>> +  }
>>>>>> +)
>>>>>> +
>>>>>> +
>>>>>>  (define_insn "mul<mode>3add<mode>_neon"
>>>>>>    [(set (match_operand:VDQW 0 "s_register_operand" "=3Dw")
>>>>>>          (plus:VDQW (mult:VDQW (match_operand:VDQW 2 "s_register_ope=
rand" "w")
>>>>>> diff --git a/gcc/testsuite/gcc.target/arm/vect-div-1.c b/gcc/testsui=
te/gcc.target/arm/vect-div-1.c
>>>>>> new file mode 100644
>>>>>> index 0000000..e562ef3
>>>>>> --- /dev/null
>>>>>> +++ b/gcc/testsuite/gcc.target/arm/vect-div-1.c
>>>>>> @@ -0,0 +1,14 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-require-effective-target arm_v8_neon_ok } */
>>>>>> +/* { dg-options "-O2 -funsafe-math-optimizations -ftree-vectorize -=
fdump-tree-vect-all" } */
>>>>>> +/* { dg-add-options arm_v8_neon } */
>>>>>
>>>>> No this is wrong.
>>>>>
>>>>> What is armv8 specific about this test ? This is just like another te=
st that is for Neon. vrecpe / vrecps are not instructions that were introdu=
ced in the v8 version of the architecture. They've existed in the base Neon=
 instruction set. The code generation above in the patterns will be enabled=
 when TARGET_NEON is true which can happen when -mfpu=3Dneon -mfloat-abi=3D=
{softfp/hard} is true.
>>>>>
>>>>>> +
>>>>>> +void
>>>>>> +foo (int len, float * __restrict p, float *__restrict x)
>>>>>> +{
>>>>>> +  len =3D len & ~31;
>>>>>> +  for (int i =3D 0; i < len; i++)
>>>>>> +    p[i] =3D p[i] / x[i];
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" =
} } */
>>>>>> diff --git a/gcc/testsuite/gcc.target/arm/vect-div-2.c b/gcc/testsui=
te/gcc.target/arm/vect-div-2.c
>>>>>> new file mode 100644
>>>>>> index 0000000..8e15d0a
>>>>>> --- /dev/null
>>>>>> +++ b/gcc/testsuite/gcc.target/arm/vect-div-2.c
>>>>>> @@ -0,0 +1,14 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-require-effective-target arm_v8_neon_ok } */
>>>>>
>>>>> And likewise.
>>>>>
>>>>>> +/* { dg-options "-O2 -funsafe-math-optimizations -fno-reciprocal-ma=
th -ftree-vectorize -fdump-tree-vect-all" } */
>>>>>> +/* { dg-add-options arm_v8_neon } */
>>>>>> +
>>>>>> +void
>>>>>> +foo (int len, float * __restrict p, float *__restrict x)
>>>>>> +{
>>>>>> +  len =3D len & ~31;
>>>>>> +  for (int i =3D 0; i < len; i++)
>>>>>> +    p[i] =3D p[i] / x[i];
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" =
} } */
>>>>>
>>>>>
>>>>> regards
>>>>> Ramana