On Thu, May 19, 2016 at 05:29:16PM +0000, Joseph Myers wrote: > On Thu, 19 May 2016, Jiong Wang wrote: > > > Then, > > > > * if we add scalar HF mode to standard patterns, vector HF modes operation > > will be > > turned into scalar HF operations instead of scalar SF operations. > > > > * if we add vector HF mode to standard patterns, vector HF modes operations > > will > > generate vector HF instructions directly. > > > > Will this still cause precision inconsistence with old gcc when there are > > cascade > > vector float operations? > > I'm not sure inconsistency with old GCC is what's relevant here. > > Standard-named RTL patterns have particular semantics. Those semantics do > not depend on the target architecture (except where there are target > macros / hooks to define such dependence). If you have an instruction > that matches those target-independent semantics, it should be available > for the standard-named pattern. I believe that is the case here, for both > the scalar and the vector instructions - they have the standard semantics, > so should be available for the standard patterns. > > It is the responsibility of the target-independent parts of the compiler > to ensure that the RTL generated matches the source code semantics, so > that providing a standard pattern for an instruction that matches the > pattern's semantics does not cause any problems regarding source code > semantics. > > That said: if the expander in old GCC is converting a vector HF operation > into scalar SF operations, I'd expect it also to include a conversion from > SFmode back to HFmode after those operations, since it will be producing a > vector HF result. And that would apply for each individual operation > expanded. So I would not expect inconsistency to arise from making direct > HFmode operations available (given that the semantics of scalar + - * / > are the same whether you do them directly on HFmode or promote to SFmode, > do the operation there and then convert the result back to HFmode before > doing any further operations on it). I think the confusion here is that these two functions: float16x8_t __attribute__ ((noinline)) foo (float16x8_t a, float16x8_t b, float16x8_t c) { return a * b / c; } float16_t __attribute__ ((noinline)) bar (float16_t a, float16_t b, float16_t c) { return a * b / c; } Have different behaviours in terms of when they extend and truncate between floating-point precisions. A full testcase calling these functions is attached. Compile with `gcc -O3` for AArch64 ARMv8-A `gcc -O3 -mfloat-abi=hard -mfpu=neon-fp16 -mfp16-format=ieee -march=armv7-a` for ARMv7-A This prints: Fail: Scalar Input 256.000000 Scalar Output 256.000000 Vector input 256.000000 Vector output inf Fail: Scalar Input 3.300781 Scalar Output 3.300781 Vector input 3.300781 Vector output 3.302734 Fail: Scalar Input 10000.000000 Scalar Output 10000.000000 Vector input 10000.000000 Vector output inf Fail: Scalar Input 0.000003 Scalar Output 0.000003 Vector input 0.000003 Vector output 0.000000 Fail: Scalar Input 0.000400 Scalar Output 0.000400 Vector input 0.000400 Vector output 0.000447 foo, operating on vectors, remains in 16-bit precision throughout gimple, will scalarise during veclower, and will add float_extend and float_truncate around each operation during expand to preserve the 16-bit rounding behaviour. For this testcase, that means two truncates per vector element. One after the multiply, one after the divide. bar, operating on scalars, adds promotions early due to TARGET_PROMOTED_TYPE. In gimple we stay in 32-bit precision for the two operations, and we truncate only after both operations. That means one truncate, taking place after the divide. However, I find this surprising at a language level, though I see that Clang 3.8 has the same behaviour. ACLE doesn't mention the GCC vector extensions, so doesn't specify the behaviour of the arithmetic operators on vector-of-float16_t types. GCC's vector extension documentation gives this definition for arithmetic operations: The types defined in this manner can be used with a subset of normal C operations. Currently, GCC allows using the following operators on these types: +, -, *, /, unary minus, ^, |, &, ~, %. The operations behave like C++ valarrays. Addition is defined as the addition of the corresponding elements of the operands. For example, in the code below, each of the 4 elements in a is added to the corresponding 4 elements in b and the resulting vector is stored in c. Subtraction, multiplication, division, and the logical operations operate in a similar manner. Likewise, the result of using the unary minus or complement operators on a vector type is a vector whose elements are the negative or complemented values of the corresponding elements in the operand. Without digging in to the compiler code, I would have expected the vector implementation to give equivalent results to the scalar one. My question is whether you consider the different behaviour between scalar float16_t and vector-of-float16_t types to be a bug? I can think of some ways to fix the vector behaviour if it is buggy, but they would of course be a change in behaviour from current releases (and from clang 3.8). Clearly, this makes no difference to your comment that we should implement these using standard pattern names. Either this is a bug, in which case the front-end will arrange for the promotion to vector-of-float32_t types, and implementing the vector standard pattern names would potentially allow for some optimisation back to vector-of-float16_t type, or this is not a bug, in which case the vector-of-float16_t standard pattern names match the expected semantics perfectly. Thanks, James