From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id CE701385840E
 for <gcc-patches@gcc.gnu.org>; Sun,  6 Mar 2022 16:16:34 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CE701385840E
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 39210D6E;
 Sun,  6 Mar 2022 08:16:34 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.88])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 421C83F766;
 Sun,  6 Mar 2022 08:16:33 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: xuchenghua@loongson.cn
Mail-Followup-To: xuchenghua@loongson.cn, gcc-patches@gcc.gnu.org,
 chenglulu@loongson.cn, joseph@codesourcery.com, richard.sandiford@arm.com
Cc: gcc-patches@gcc.gnu.org,  chenglulu@loongson.cn,  joseph@codesourcery.com
Subject: Re: [PATCH v8 04/12] LoongArch Port: Machine description files.
References: <20220304071809.3082015-1-xuchenghua@loongson.cn>
 <20220304071809.3082015-5-xuchenghua@loongson.cn>
Date: Sun, 06 Mar 2022 16:16:31 +0000
In-Reply-To: <20220304071809.3082015-5-xuchenghua@loongson.cn>
 (xuchenghua@loongson.cn's message of "Fri, 4 Mar 2022 15:18:01 +0800")
Message-ID: <mptbkyjt5sw.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Sun, 06 Mar 2022 16:16:39 -0000

Hi,

Some comments below, but otherwise it looks good to me.

xuchenghua@loongson.cn writes:
> [=E2=80=A6]
> +(define_memory_constraint "k"
> +  "A memory operand whose address is formed by a base register and (opti=
onally scaled)
> +   index register."
> +  (and (match_code "mem")
> +       (not (match_test "loongarch_14bit_shifted_offset_address_p (XEXP =
(op, 0), mode)"))
> +       (not (match_test "loongarch_12bit_offset_address_p (XEXP (op, 0),=
 mode)"))))

It's not really safe to test MEM addresses using only negative conditions.
There needs to be a positive condition too, even if it's only:

       (match_test "memory_address_addr_space_p (GET_MODE (op), XEXP (op, 0=
),
						 MEM_ADDR_SPACE (op))")))

(from common.md).

> [=E2=80=A6]
> +(define_constraint "v"
> +  "A nsigned 64-bit constant and low 44-bit is zero (for logic instructi=
ons)."

typo

> [=E2=80=A6]
> +(define_memory_constraint "ZB"
> +  "@internal
> +  An address that is held in a general-purpose register.
> +  The offset is zero"
> +  (and (match_code "mem")
> +       (match_test "GET_CODE (XEXP (op,0)) =3D=3D REG")))

It'd be good to use things like REG_P in new code.

Formatting nit: should be a space after =E2=80=9Cop,=E2=80=9D.

> [=E2=80=A6]
> +;; Pipeline descriptions.
> +;;
> +;; generic.md provides a fallback for processors without a specific
> +;; pipeline description.  It is derived from the old define_function_unit
> +;; version and uses the "alu" and "imuldiv" units declared below.
> +;;
> +;; Some of the processor-specific files are also derived from old
> +;; define_function_unit descriptions and simply override the parts of
> +;; generic.md that don't apply.  The other processor-specific files
> +;; are self-contained.

I don't think these last two paragraphs apply to the new code.
The MIPS generic.md was converted from a much older pipeline
description format.  The conversion meant the older processor
descriptions weren't self-contained and relied on a mixture
of processor-specific things (in their own file) and generic
things (in this file).

New processor descriptions should be self-contained as far
as possible, in terms of not sharing cpu units with other
processor descriptions.

> +(define_automaton "alu,imuldiv")
> +
> +(define_cpu_unit "alu" "alu")
> +(define_cpu_unit "imuldiv" "imuldiv")
> +
> +;; Ghost instructions produce no real code.
> +;; They exist purely to express an effect on dataflow.
> +(define_insn_reservation "ghost" 0
> +  (eq_attr "type" "ghost")
> +  "nothing")
> +
> +;; This file is derived from the old define_function_unit description.
> +;; Each reservation can be overridden on a processor-by-processor basis.

Same for this last comment.

The "ghost" reservation is inherently sharable because it doesn't
use any CPU units.  But for a new port, I think the other reservations
in this file should be conditional on a particular -mtune.

> [=E2=80=A6]
> diff --git a/gcc/config/loongarch/la464.md b/gcc/config/loongarch/la464.md
> new file mode 100644
> index 00000000000..ae3808b51bb
> --- /dev/null
> +++ b/gcc/config/loongarch/la464.md
> @@ -0,0 +1,132 @@
> +;; Pipeline model for LoongArch LA464 cores.
> +
> +;; Copyright (C) 2021-2022 Free Software Foundation, Inc.
> +;; Contributed by Loongson Ltd.
> +
> +;; This file is part of GCC.
> +;;
> +;; GCC is free software; you can redistribute it and/or modify it
> +;; under the terms of the GNU General Public License as published
> +;; by the Free Software Foundation; either version 3, or (at your
> +;; option) any later version.
> +;;
> +;; GCC is distributed in the hope that it will be useful, but WITHOUT
> +;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +;; License for more details.
> +;;
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3.  If not see
> +;; <http://www.gnu.org/licenses/>.
> +
> +;; Uncomment the following line to output automata for debugging.
> +;; (automata_option "v")
> +
> +;; Automaton for integer instructions.
> +(define_automaton "la464_a_alu")
> +
> +;; Automaton for floating-point instructions.
> +(define_automaton "la464_a_falu")
> +
> +;; Automaton for memory operations.
> +(define_automaton "la464_a_mem")
> +
> +;; Describe the resources.
> +
> +(define_cpu_unit "la464_alu1" "la464_a_alu")
> +(define_cpu_unit "la464_alu2" "la464_a_alu")
> +(define_cpu_unit "la464_mem1" "la464_a_mem")
> +(define_cpu_unit "la464_mem2" "la464_a_mem")
> +(define_cpu_unit "la464_falu1" "la464_a_falu")
> +(define_cpu_unit "la464_falu2" "la464_a_falu")
> +
> +;; Describe instruction reservations.
> +
> +(define_insn_reservation "la464_arith" 1
> +  (and (match_test "TARGET_ARCH_LA464")

Normally scheduling should be determined by -mtune (with a default
-mtune chosen by -march when no explicit -mtune is given).  So I was
surprised to see this testing TARGET_ARCH_* instead of TARGET_TUNE_*.

> +       (eq_attr "type" "arith,clz,const,logical,
> +			move,nop,shift,signext,slt"))
> +  "la464_alu1 | la464_alu2")
> [=E2=80=A6]
> +;; Main data type used by the insn
> +(define_attr "mode" "unknown,none,QI,HI,SI,DI,TI,OI,SF,DF,TF,FCC"
> +  (const_string "unknown"))

Do any patterns use mode=3D=3DOI?  Reason for asking is that:

> [=E2=80=A6]
> +;; True if the main data type is four times of the size of a word.
> +(define_attr "qword_mode" "no,yes"
> +  (cond [(and (eq_attr "mode" "TI,TF")
> +	      (not (match_test "TARGET_64BIT")))
> +	 (const_string "yes")]
> +	(const_string "no")))
> +
> +;; True if the main data type is eight times of the size of a word.
> +(define_attr "oword_mode" "no,yes"
> +  (cond [(and (eq_attr "mode" "OI")
> +	      (not (match_test "TARGET_64BIT")))
> +	 (const_string "yes")]
> +	(const_string "no")))

=E2=80=A6it seemed inconsistent to treat OI is 8 words on 32-bit targets
but not as 4 words (qword_mode) on 64-bit targets.

It looks like oword_mode and OI might not be used, in which case
it's probably easiest to remove them for now.

> +(define_attr "compression" "none,all"
> +  (const_string "none"))

Does anything use this, or is it a holdover from MIPS?  I could see some
definitions of the attribute later in the file, but there didn't seem
to be any users.

> [=E2=80=A6]
> +;; This mode iterator allows the QI HI SI and DI extension patterns to be

incomplete sentence.

> +(define_mode_iterator QHWD [QI HI SI (DI "TARGET_64BIT")])
> +
> +;; Iterator for hardware-supported floating-point modes.
> +(define_mode_iterator ANYF [(SF "TARGET_HARD_FLOAT")
> +			    (DF "TARGET_DOUBLE_FLOAT")])
> +
> +;; A floating-point mode for which moves involving FPRs may need to be s=
plit.

Probably s/floating-point //, given that the iterator includes DI.

> +(define_mode_iterator SPLITF
> +  [(DF "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
> +   (DI "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
> +   (TF "TARGET_64BIT && TARGET_DOUBLE_FLOAT")])
> +
> +;; In GPR templates, a string like "mul.<d>" will expand to "mul" in the
> +;; 32-bit "mul.w" and "mul.d" in the 64-bit version.

Maybe:

;; In GPR templates, a string like "mul.<d>" will expand to "mul.w" in the
;; 32-bit version and "mul.d" in the 64-bit version.

> +(define_mode_attr d [(SI "w") (DI "d")])
> +
> +;; This attribute gives the length suffix for a load or store instructio=
n.
> +;; The same suffixes work for zero and sign extensions.
> +(define_mode_attr size [(QI "b") (HI "h") (SI "w") (DI "d")])
> +(define_mode_attr SIZE [(QI "B") (HI "H") (SI "W") (DI "D")])
> +
> +;; This attributes gives the mode mask of a SHORT.

s/attributes/attribute/

> +(define_mode_attr mask [(QI "0x00ff") (HI "0xffff")])
> +
> +;; This attributes gives the size (bits) of a SHORT.

Maybe:

;; This attribute gives the number of bits in a SHORT minus one.

since it's 7,15 rather than 8,16.

> +(define_mode_attr qi_hi [(QI "7") (HI "15")])

Just a suggestion, but it might be worth renaming this.  qi_hi only
describes the range of inputs rather than what the attribute is.

> [=E2=80=A6]
> +(define_insn "*addsi3_extended"
> +  [(set (match_operand:DI 0 "register_operand" "=3Dr,r")
> +	(sign_extend:DI
> +	     (plus:SI (match_operand:SI 1 "register_operand" "r,r")
> +		      (match_operand:SI 2 "arith_operand" "r,I"))))]
> +  "TARGET_64BIT"
> +  "add%i2.w\t%0,%1,%2"
> +  [(set_attr "alu_type" "add")
> +   (set_attr "mode" "SI")])
> +
> +(define_insn "*addsi3_extended2"
> +  [(set (match_operand:DI 0 "register_operand" "=3Dr,r")
> +	(sign_extend:DI
> +	  (subreg:SI (plus:DI (match_operand:DI 1 "register_operand" "r,r")
> +			      (match_operand:DI 2 "arith_operand"    "r,I"))
> +		     0)))]
> +  "TARGET_64BIT"
> +  "add%i2.w\t%0,%1,%2"
> +  [(set_attr "alu_type" "add")
> +   (set_attr "mode" "SI")])

This is really a question for part 5, but it affects this part too:

I notice the port defines TRULY_NOOP_TRUNCATION to false for DI->SI,
like MIPS does.  Are you sure that's the right trade-off?  It had to
be defined that way for MIPS because 32-bit MIPS instructions gave
undefined results if the input operands weren't in sign-extended form.
For example a 32-bit add with 64-bit register contents:

   0x(ffffffff_)fffffffe
 + 0x(00000000_)00000001

was guaranteed to give -1 (sign-extended) but:

   0x(00000000_)fffffffe
 + 0x(00000000_)00000001

gave an undefined result.

Does Loongson have the same restriction, or do the 32-bit instructions
simply ignore the upper 32 bits?  If Loongson ignores the upper bits
then it would probably be better not to define TRULY_NOOP_TRUNCATION.

If you do that, the subreg pattern above shouldn't be needed; the
subreg should get folded away by target-independent code.

> +;;
> +;;  ....................
> +;;
> +;;	MULTIPLICATION
> +;;
> +;;  ....................
> +;;
> +
> +(define_insn "mul<mode>3"
> +  [(set (match_operand:ANYF 0 "register_operand" "=3Df")
> +	(mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
> +		   (match_operand:ANYF 2 "register_operand" "f")))]
> +  "TARGET_HARD_FLOAT"

Not a big deal, just noticed that some ANYF patterns (like this one)
add an explicit TARGET_HARD_FLOAT on top of the ANYF conditions:

(define_mode_iterator ANYF [(SF "TARGET_HARD_FLOAT")
			    (DF "TARGET_DOUBLE_FLOAT")])

while other patterns don't.  Both are correct of course, but it might
be good to use one style throughout the file.  At first, when I saw
the pattern above, I thought the TARGET_HARD_FLOAT was missing from
the earlier patterns.

> [=E2=80=A6]
> +(define_insn "*div<mode>3"
> +  [(set (match_operand:ANYF 0 "register_operand" "=3Df")
> +	(div:ANYF (match_operand:ANYF 1 "register_operand" "f")
> +		  (match_operand:ANYF 2 "register_operand" "f")))]
> +  "TARGET_HARD_FLOAT"
> +  "fdiv.<fmt>\t%0,%1,%2"
> +  [(set_attr "type" "fdiv")
> +   (set_attr "mode" "<UNITMODE>")
> +   (set_attr "insn_count" "1")])
> +
> +;; In 3A5000, the reciprocal operation is the same as the division opera=
tion.
> +
> +(define_insn "*recip<mode>3"
> +  [(set (match_operand:ANYF 0 "register_operand" "=3Df")
> +	(div:ANYF (match_operand:ANYF 1 "const_1_operand" "")
> +		  (match_operand:ANYF 2 "register_operand" "f")))]
> +  "TARGET_HARD_FLOAT"
> +  "frecip.<fmt>\t%0,%2"
> +  [(set_attr "type" "frdiv")
> +   (set_attr "mode" "<UNITMODE>")
> +   (set_attr "insn_count" "1")])

Very minor, but these insn_counts seem redundant.  (MIPS needed them
due to the SB1 errata workaround.)

> [=E2=80=A6]
> +;; Floating point multiply accumulate instructions.
> +
> +;; a * b + c
> +(define_expand "fma<mode>4"
> +  [(set (match_operand:ANYF 0 "register_operand")
> +	(fma:ANYF (match_operand:ANYF 1 "register_operand")
> +		  (match_operand:ANYF 2 "register_operand")
> +		  (match_operand:ANYF 3 "register_operand")))]
> +  "TARGET_HARD_FLOAT")
> +
> +(define_insn "*fma<mode>4_madd4"
> +  [(set (match_operand:ANYF 0 "register_operand" "=3Df")
> +	(fma:ANYF (match_operand:ANYF 1 "register_operand" "f")
> +		  (match_operand:ANYF 2 "register_operand" "f")
> +		  (match_operand:ANYF 3 "register_operand" "f")))]
> +  "TARGET_HARD_FLOAT"
> +  "fmadd.<fmt>\t%0,%1,%2,%3"
> +  [(set_attr "type" "fmadd")
> +   (set_attr "mode" "<UNITMODE>")])

This pair of patterns could be a single define_insn, like for fms.

> [=E2=80=A6]
> +;; Integer truncation patterns.  Truncating SImode values to smaller
> +;; modes is a no-op, as it is for most other GCC ports.  Truncating
> +;; DImode values to SImode is not a no-op for TARGET_64BIT since we
> +;; need to make sure that the lower 32 bits are properly sign-extended
> +;; (see TARGET_TRULY_NOOP_TRUNCATION).  Truncating DImode values into mo=
des
> +;; smaller than SImode is equivalent to two separate truncations:
> +;;
> +;;			  A       B
> +;;    DI ---> HI  =3D=3D  DI ---> SI ---> HI
> +;;    DI ---> QI  =3D=3D  DI ---> SI ---> QI
> +;;
> +;; Step A needs a real instruction but step B does not.
> +
> +(define_insn "truncdi<mode>2"
> +  [(set (match_operand:SUBDI 0 "nonimmediate_operand" "=3Dr,m,k")
> +	(truncate:SUBDI (match_operand:DI 1 "register_operand" "r,r,r")))]
> +  "TARGET_64BIT"
> +  "@
> +    slli.w\t%0,%1,0
> +    st.<size>\t%1,%0
> +    stx.<size>\t%1,%0"
> +  [(set_attr "move_type" "sll0,store,store")
> +   (set_attr "mode" "SI")])

Following on from the above, this pattern wouldn't be needed
if TRULY_NOOP_TRUNCATION can be left undefined.

> [=E2=80=A6]
> +;; Combiner patterns to optimize shift/truncate combinations.
> +
> +(define_insn "*ashr_trunc<mode>"
> +  [(set (match_operand:SUBDI 0 "register_operand" "=3Dr")
> +	(truncate:SUBDI
> +	  (ashiftrt:DI (match_operand:DI 1 "register_operand" "r")
> +		       (match_operand:DI 2 "const_arith_operand" ""))))]
> +  "TARGET_64BIT && IN_RANGE (INTVAL (operands[2]), 32, 63)"
> +  "srai.d\t%0,%1,%2"
> +  [(set_attr "type" "shift")
> +   (set_attr "mode" "<MODE>")])
> +
> +(define_insn "*lshr32_trunc<mode>"
> +  [(set (match_operand:SUBDI 0 "register_operand" "=3Dr")
> +	(truncate:SUBDI
> +	  (lshiftrt:DI (match_operand:DI 1 "register_operand" "r")
> +		       (const_int 32))))]
> +  "TARGET_64BIT"
> +  "srai.d\t%0,%1,32"
> +  [(set_attr "type" "shift")
> +   (set_attr "mode" "<MODE>")])

Same for these.

> +
> +;;
> +;;  ....................
> +;;
> +;;	ZERO EXTENSION
> +;;
> +;;  ....................
> +
> +(define_insn "zero_extendsidi2"
> +  [(set (match_operand:DI 0 "register_operand" "=3Dr,r,r,r")
> +	(zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,ZC,m,k"))=
)]
> +  "TARGET_64BIT"
> +  "@
> +   bstrpick.d\t%0,%1,31,0
> +   ldptr.w\t%0,%1\n\tlu32i.d\t%0,0

FWIW, splitting this after reload would give more scheduling freedom,
but it's not necessary to do that for the submission.

> +   ld.wu\t%0,%1
> +   ldx.wu\t%0,%1"
> +  [(set_attr "move_type" "arith,load,load,load")
> +   (set_attr "mode" "DI")
> +   (set_attr "insn_count" "1,2,1,1")])
> +
> [=E2=80=A6]
> +;; Combiner patterns to optimize truncate/zero_extend combinations.
> +
> +(define_insn "*zero_extend<GPR:mode>_trunc<SHORT:mode>"
> +  [(set (match_operand:GPR 0 "register_operand" "=3Dr")
> +	(zero_extend:GPR
> +	    (truncate:SHORT (match_operand:DI 1 "register_operand" "r"))))]
> +  "TARGET_64BIT"
> +  "bstrpick.w\t%0,%1,<SHORT:qi_hi>,0"
> +  [(set_attr "move_type" "pick_ins")
> +   (set_attr "mode" "<GPR:MODE>")])
> +
> +(define_insn "*zero_extendhi_truncqi"
> +  [(set (match_operand:HI 0 "register_operand" "=3Dr")
> +	(zero_extend:HI
> +	    (truncate:QI (match_operand:DI 1 "register_operand" "r"))))]
> +  "TARGET_64BIT"
> +  "andi\t%0,%1,0xff"
> +  [(set_attr "alu_type" "and")
> +   (set_attr "mode" "HI")])
> +
> +;;
> +;;  ....................
> +;;
> +;;	SIGN EXTENSION
> +;;
> +;;  ....................
> +
> +;; Extension insns.
> +;; Those for integer source operand are ordered widest source type first.
> +
> +;; When TARGET_64BIT, all SImode integer should already be in sign-exten=
ded
> +;; form (see TARGET_TRULY_NOOP_TRUNCATION and truncdisi2).  We can there=
fore
> +;; get rid of register->register instructions if we constrain the source=
 to
> +;; be in the same register as the destination.
> +;;
> +;; Only the pre-reload scheduler sees the type of the register alternati=
ves;
> +;; we split them into nothing before the post-reload scheduler runs.
> +;; These alternatives therefore have type "move" in order to reflect
> +;; what happens if the two pre-reload operands cannot be tied, and are
> +;; instead allocated two separate GPRs.
> +(define_insn_and_split "extendsidi2"
> +  [(set (match_operand:DI 0 "register_operand" "=3Dr,r,r,r")
> +	(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "0,ZC,m,k"))=
)]
> +  "TARGET_64BIT"
> +  "@
> +   #
> +   ldptr.w\t%0,%1
> +   ld.w\t%0,%1
> +   ldx.w\t%0,%1"
> +  "&& reload_completed && register_operand (operands[1], VOIDmode)"
> +  [(const_int 0)]
> +{
> +  emit_note (NOTE_INSN_DELETED);
> +  DONE;
> +}
> +  [(set_attr "move_type" "move,load,load,load")
> +   (set_attr "mode" "DI")])

The three patterns above would also look different without
the definition of TRULY_NOOP_TRUNCATION.

> [=E2=80=A6]
> +(define_insn "*extenddi_truncate<mode>"
> +  [(set (match_operand:DI 0 "register_operand" "=3Dr")
> +	(sign_extend:DI
> +	    (truncate:SHORT (match_operand:DI 1 "register_operand" "r"))))]
> +  "TARGET_64BIT"
> +  "ext.w.<size>\t%0,%1"
> +  [(set_attr "move_type" "signext")
> +   (set_attr "mode" "DI")])
> +
> +(define_insn "*extendsi_truncate<mode>"
> +  [(set (match_operand:SI 0 "register_operand" "=3Dr")
> +	(sign_extend:SI
> +	    (truncate:SHORT (match_operand:DI 1 "register_operand" "r"))))]
> +  "TARGET_64BIT"
> +  "ext.w.<size>\t%0,%1"
> +  [(set_attr "move_type" "signext")
> +   (set_attr "mode" "SI")])
> +
> +(define_insn "*extendhi_truncateqi"
> +  [(set (match_operand:HI 0 "register_operand" "=3Dr")
> +	(sign_extend:HI
> +	    (truncate:QI (match_operand:DI 1 "register_operand" "r"))))]
> +  "TARGET_64BIT"
> +  "ext.w.b\t%0,%1"
> +  [(set_attr "move_type" "signext")
> +   (set_attr "mode" "SI")])

Same for these.

> +
> +(define_insn "extendsfdf2"
> +  [(set (match_operand:DF 0 "register_operand" "=3Df")
> +	(float_extend:DF (match_operand:SF 1 "register_operand" "f")))]
> +  "TARGET_DOUBLE_FLOAT"
> +  "fcvt.d.s\t%0,%1"
> +  [(set_attr "type" "fcvt")
> +   (set_attr "cnv_mode"	"S2D")
> +   (set_attr "mode" "DF")])

Very minor, but it's probably better to use a space rather than a tab
after "cnv_mode", for consistency with the other attributes.  Same in
the rest of the file.

> +;; floating point value by converting to value to an unsigned integer

typo, maybe:

;; Convert a floating-point value to an unsigned integer.

> +
> +(define_expand "fixuns_truncdfsi2"
> +  [(set (match_operand:SI 0 "register_operand")
> +	(unsigned_fix:SI (match_operand:DF 1 "register_operand")))]
> +  "TARGET_DOUBLE_FLOAT"
> +{
> +  rtx reg1 =3D gen_reg_rtx (DFmode);
> +  rtx reg2 =3D gen_reg_rtx (DFmode);
> +  rtx reg3 =3D gen_reg_rtx (SImode);
> +  rtx_code_label *label1 =3D gen_label_rtx ();
> +  rtx_code_label *label2 =3D gen_label_rtx ();
> +  rtx test;
> +  REAL_VALUE_TYPE offset;
> +
> +  real_2expN (&offset, 31, DFmode);
> +
> +  if (reg1)		      /* Turn off complaints about unreached code.  */
> +    {

I think we can drop this workaround.  IIRC it was needed for the
MIPS IRIX compilers, or maybe it was some ancient version of GCC.

> +      loongarch_emit_move (reg1,
> +			   const_double_from_real_value (offset, DFmode));
> +      do_pending_stack_adjust ();
> +
> +      test =3D gen_rtx_GE (VOIDmode, operands[1], reg1);
> +      emit_jump_insn (gen_cbranchdf4 (test, operands[1], reg1, label1));
> +
> +      emit_insn (gen_fix_truncdfsi2 (operands[0], operands[1]));
> +      emit_jump_insn (gen_rtx_SET (pc_rtx,
> +				   gen_rtx_LABEL_REF (VOIDmode, label2)));
> +      emit_barrier ();
> +
> +      emit_label (label1);
> +      loongarch_emit_move (reg2, gen_rtx_MINUS (DFmode, operands[1], reg=
1));
> +      loongarch_emit_move (reg3, GEN_INT (trunc_int_for_mode
> +				     (BITMASK_HIGH, SImode)));
> +
> +      emit_insn (gen_fix_truncdfsi2 (operands[0], reg2));
> +      emit_insn (gen_iorsi3 (operands[0], operands[0], reg3));
> +
> +      emit_label (label2);
> +
> +      /* Allow REG_NOTES to be set on last insn (labels don't have enough
> +	 fields, and can't be used for REG_NOTES anyway).  */
> +      emit_use (stack_pointer_rtx);
> +      DONE;
> +    }
> +})
> [=E2=80=A6]
> +;; Allow combine to split complex const_int load sequences, using operan=
d 2
> +;; to store the intermediate results.  See move_operand for details.
> +(define_split
> +  [(set (match_operand:GPR 0 "register_operand")
> +	(match_operand:GPR 1 "splittable_const_int_operand"))
> +   (clobber (match_operand:GPR 2 "register_operand"))]
> +  ""
> +  [(const_int 0)]
> +{
> +  loongarch_move_integer (operands[2], operands[0], INTVAL (operands[1])=
);
> +  DONE;
> +})

Does this define_split trigger?  I couldn't see an associated
define_insn that would provide the clobber.

> +(define_insn "*movsi_internal"
> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=3Dr,r,r,w,*f,*f,*r,=
*m,*r,*z")
> +	(match_operand:SI 1 "move_operand" "r,Yd,w,rJ,*r*J,*m,*f,*f,*z,*r"))]
> +  "(register_operand (operands[0], SImode)
> +       || reg_or_0_operand (operands[1], SImode))"

Minor formatting nit: too much indentantion on the line above.  Same for
the other moves.

> [=E2=80=A6]
> +;; Conditional move instructions.
> +
> +(define_insn "*sel<code><GPR:mode>_using_<GPR2:mode>"
> +  [(set (match_operand:GPR 0 "register_operand" "=3Dr,r")
> +	(if_then_else:GPR
> +	 (equality_op:GPR2 (match_operand:GPR2 1 "register_operand" "r,r")
> +			   (const_int 0))
> +	 (match_operand:GPR 2 "reg_or_0_operand" "r,J")
> +	 (match_operand:GPR 3 "reg_or_0_operand" "J,r")))]
> +  "register_operand (operands[2], <GPR:MODE>mode)
> +       !=3D register_operand (operands[3], <GPR:MODE>mode)"

Same here.

> +  "@
> +   <sel>\t%0,%2,%1
> +   <selinv>\t%0,%3,%1"
> +  [(set_attr "type" "condmove")
> +   (set_attr "mode" "<GPR:MODE>")])
> +
> +;; sel.fmt copies the 3rd argument when the 1st is non-zero and the 2nd

s/sel.fmt/fsel/

> +;; argument if the 1st is zero.  This means operand 2 and 3 are
> +;; inverted in the instruction.
> +
> +(define_insn "*sel<mode>"
> +  [(set (match_operand:ANYF 0 "register_operand" "=3Df")
> +	(if_then_else:ANYF
> +	 (ne:FCC (match_operand:FCC 1 "register_operand" "z")
> +		 (const_int 0))
> +	 (match_operand:ANYF 2 "reg_or_0_operand" "f")
> +	 (match_operand:ANYF 3 "reg_or_0_operand" "f")))]
> +  "TARGET_HARD_FLOAT"
> +  "fsel\t%0,%3,%2,%1"
> +  [(set_attr "type" "condmove")
> +   (set_attr "mode" "<ANYF:MODE>")])
> [=E2=80=A6]
> +(define_insn "lu52i_d"
> +  [(set (match_operand:DI 0 "register_operand" "=3Dr")
> +	(ior:DI
> +	  (and:DI (match_operand:DI 1 "register_operand" "r")
> +		  (match_operand 2 "lu52i_mask_operand"))
> +	  (match_operand 3 "const_lu52i_operand" "v")))]
> +    "TARGET_64BIT"
> +    "lu52i.d\t%0,%1,%X3>>52"
> +    [(set_attr "type" "arith")
> +     (set_attr "mode" "DI")])

Formatting nit: too much indentation from "TARGET_64BIT" onwards.

> +
> +;; Convert floating-point numbers to integers
> +(define_insn "frint_<fmt>"
> +  [(set (match_operand:ANYF 0 "register_operand" "=3Df")
> +	(unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")]
> +		      UNSPEC_FRINT))]
> +  "TARGET_HARD_FLOAT"
> +  "frint.<fmt>\t%0,%1"
> +  [(set_attr "type" "fcvt")
> +   (set_attr "mode" "<MODE>")])
> +
> +;; LoongArch supports loading and storing a floating point register from
> +;; the sum of two general-purpose registers.  We use two versions for ea=
ch of
> +;; these four instructions: one where the two general-purpose registers =
are
> +;; SImode, and one where they are DImode.  This is because general-purpo=
se
> +;; registers will be in SImode when they hold 32-bit values, but,
> +;; since the 32-bit values are always sign extended, the f{ld/st}x.{s/d}
> +;; instructions will still work correctly.
> +
> +;; ??? Perhaps it would be better to support these instructions by
> +;; modifying TARGET_LEGITIMATE_ADDRESS_P and friends.  However, since
> +;; these instructions can only be used to load and store floating
> +;; point registers, that would probably cause trouble in reload.

Does this comment apply to Loongson?  It looks from:

  +;; Similarly for LoongArch indexed GPR loads and stores.
  +(define_mode_attr loadx [(QI "ldx.b")
  +			 (HI "ldx.h")
  +			 (SI "ldx.w")
  +			 (DI "ldx.d")])
  +(define_mode_attr storex [(QI "stx.b")
  +			  (HI "stx.h")
  +			  (SI "stx.w")
  +			  (DI "stx.d")])

like Loongson has a full set of indexed loads and stores, so supporting
indexed addresses in TARGET_LEGITIMATE_ADDRESS_P should work.

Also, the MIPS comment predates LRA, which is better than old reload
at handling irregular memory address requirements.  Accepting indexed
addresses in TARGET_LEGITIMATE_ADDRESS_P would allow ivopts to optimise
the code better.

> [=E2=80=A6]
> +;; Expand in-line code to clear the instruction cache between operand[0]=
 and
> +;; operand[1].
> +(define_expand "clear_cache"
> +  [(match_operand 0 "pmode_register_operand")
> +   (match_operand 1 "pmode_register_operand")]
> +  ""
> +  "
> +{
> +  emit_insn (gen_ibar (const0_rtx));
> +  DONE;
> +}")

Minor nit: the quotes before { and after } are unnecessary.

> [=E2=80=A6]
> +(define_insn "asrtle_d"
> +	[(unspec_volatile:DI [(match_operand:DI 0 "register_operand" "r")
> +			      (match_operand:DI 1 "register_operand" "r")]
> +			      UNSPECV_ASRTLE_D)]
> +  "TARGET_64BIT"
> +  "asrtle.d\t%0,%1"
> +  [(set_attr "type" "load")
> +   (set_attr "mode" "DI")])
> +
> +(define_insn "asrtgt_d"
> +	[(unspec_volatile:DI [(match_operand:DI 0 "register_operand" "r")
> +			      (match_operand:DI 1 "register_operand" "r")]
> +			      UNSPECV_ASRTGT_D)]
> +  "TARGET_64BIT"
> +  "asrtgt.d\t%0,%1"
> +  [(set_attr "type" "load")
> +   (set_attr "mode" "DI")])

Formatting: the unspec_volatile pattern is indented too far in both cases.

> [=E2=80=A6]
> +;; The following templates were added to generate "bstrpick.d + alsl.d"
> +;; instruction pairs.
> +;; It is required that the values of const_immalsl_operand and
> +;; immediate_operand must have the following correspondence:
> +;;
> +;; (immediate_operand >> const_immalsl_operand) =3D=3D 0xffffffff
> +
> +(define_insn "zero_extend_ashift1"
> +  [(set (match_operand:DI 0 "register_operand" "=3Dr")
> +	(and:DI (ashift:DI (subreg:DI (match_operand:SI 1 "register_operand" "r=
") 0)
> +			   (match_operand 2 "const_immalsl_operand" ""))
> +		(match_operand 3 "immediate_operand" "")))]
> +  "TARGET_64BIT
> +   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) =3D=3D 0xffffffff)"
> +  "bstrpick.d\t%0,%1,31,0\n\talsl.d\t%0,%0,$r0,%2"
> +  [(set_attr "type" "arith")
> +   (set_attr "mode" "DI")
> +   (set_attr "insn_count" "2")])

Without the TRULY_NOOP_TRUNCATION definition, this pattern would be
redundant with=E2=80=A6

> +(define_insn "zero_extend_ashift2"
> +  [(set (match_operand:DI 0 "register_operand" "=3Dr")
> +	(and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
> +			   (match_operand 2 "const_immalsl_operand" ""))
> +		(match_operand 3 "immediate_operand" "")))]
> +  "TARGET_64BIT
> +   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) =3D=3D 0xffffffff)"
> +  "bstrpick.d\t%0,%1,31,0\n\talsl.d\t%0,%0,$r0,%2"
> +  [(set_attr "type" "arith")
> +   (set_attr "mode" "DI")
> +   (set_attr "insn_count" "2")])

=E2=80=A6this one.  I'm surprised it isn't already TBH.  Doesn't the subreg=
 match:

  (match_operand:DI 1 "register_operand" "r")

?

> +
> +(define_insn "alsl_paired1"
> +  [(set (match_operand:DI 0 "register_operand" "=3D&r")
> +	(plus:DI (and:DI (ashift:DI (subreg:DI (match_operand:SI 1 "register_op=
erand" "r") 0)
> +				    (match_operand 2 "const_immalsl_operand" ""))
> +			 (match_operand 3 "immediate_operand" ""))
> +		 (match_operand:DI 4 "register_operand" "r")))]
> +  "TARGET_64BIT
> +   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) =3D=3D 0xffffffff)"
> +  "bstrpick.d\t%0,%1,31,0\n\talsl.d\t%0,%0,%4,%2"
> +  [(set_attr "type" "arith")
> +  (set_attr "mode" "DI")
> +  (set_attr "insn_count" "2")])
> +
> +(define_insn "alsl_paired2"
> +  [(set (match_operand:DI 0 "register_operand" "=3D&r")
> +	(plus:DI (match_operand:DI 1 "register_operand" "r")
> +		 (and:DI (ashift:DI (match_operand:DI 2 "register_operand" "r")
> +				    (match_operand 3 "const_immalsl_operand" ""))
> +			 (match_operand 4 "immediate_operand" ""))))]
> +  "TARGET_64BIT
> +   && ((INTVAL (operands[4]) >> INTVAL (operands[3])) =3D=3D 0xffffffff)"
> +  "bstrpick.d\t%0,%2,31,0\n\talsl.d\t%0,%0,%1,%3"
> +  [(set_attr "type" "arith")
> +   (set_attr "mode" "DI")
> +   (set_attr "insn_count" "2")])

Same for this pair.

> [=E2=80=A6]
> +(define_expand "tablejump"
> +  [(set (pc)
> +	(match_operand 0 "register_operand"))
> +   (use (label_ref (match_operand 1 "")))]
> +  ""
> +{
> +  if (flag_pic)
> +      operands[0] =3D expand_simple_binop (Pmode, PLUS, operands[0],
> +					 gen_rtx_LABEL_REF (Pmode,
> +							    operands[1]),
> +					 NULL_RTX, 0, OPTAB_DIRECT);

Formatting nit: the last four lines should be indented by two spaces fewer.

> +  emit_jump_insn (PMODE_INSN (gen_tablejump, (operands[0], operands[1]))=
);
> +  DONE;
> +})
> +(define_insn "sibcall_internal"
> +  [(call (mem:SI (match_operand 0 "call_insn_operand" "j,c,a,t,h"))
> +	 (match_operand 1 "" ""))]
> +  "SIBLING_CALL_P (insn)"
> +{
> +  switch (which_alternative)
> +    {
> +    case 0:
> +      return "jr\t%0";
> +    case 1:
> +      if (TARGET_CMODEL_LARGE)
> +	return "pcaddu18i\t$r12,(%%pcrel(%0+0x20000))>>18\n\t"
> +	       "jirl\t$r0,$r12,%%pcrel(%0+4)-(%%pcrel(%0+4+0x20000)>>18<<18)";
> +      else if (TARGET_CMODEL_EXTREME)
> +	return "la.local\t$r12,$r13,%0\n\tjr\t$r12";
> +      else
> +	return "b\t%0";
> +    case 2:
> +      if (TARGET_CMODEL_TINY_STATIC)
> +	return "b\t%0";
> +      else if (TARGET_CMODEL_EXTREME)
> +	return "la.global\t$r12,$r13,%0\n\tjr\t$r12";
> +      else
> +	return "la.global\t$r12,%0\n\tjr\t$r12";
> +    case 3:
> +      if (TARGET_CMODEL_EXTREME)
> +	return "la.global\t$r12,$r13,%0\n\tjr\t$r12";
> +      else
> +	return "la.global\t$r12,%0\n\tjr\t$r12";
> +    case 4:
> +      if (TARGET_CMODEL_NORMAL || TARGET_CMODEL_TINY)
> +	return "b\t%%plt(%0)";
> +      else if (TARGET_CMODEL_LARGE)
> +	return "pcaddu18i\t$r12,(%%plt(%0)+0x20000)>>18\n\t"
> +	       "jirl\t$r0,$r12,%%plt(%0)+4-((%%plt(%0)+(4+0x20000))>>18<<18)";
> +      else
> +	{
> +	  sorry ("cmodel extreme and tiny static not support plt");

=E2=80=9Cdo not support PLTs=E2=80=9D.

Can this be triggered by a certain combination of source code
and command-line options, or is it really an internal error?

> +	  return "";  /* GCC complains about may fall through.  */

sorry() isn't a fatal error, so the compiler will continue.
Returning "" is the right thing to do, but the comment makes it
sound unreachable.

Same comments for later sorry() + return pairs.

> +	}
> +    default:
> +      gcc_unreachable ();
> +    }
> +}
> +  [(set_attr "jirl" "indirect,direct,direct,direct,direct")])
> +
> +(define_expand "sibcall_value"
> +  [(parallel [(set (match_operand 0 "")
> +		   (call (match_operand 1 "")
> +			 (match_operand 2 "")))
> +	      (use (match_operand 3 ""))])]		;; next_arg_reg
> +  ""
> +{
> +  rtx target =3D loongarch_legitimize_call_address (XEXP (operands[1], 0=
));
> +
> + /*  Handle return values created by loongarch_pass_fpr_pair.  */

Formatting nit: should be two spaces before =E2=80=9C/*=E2=80=9D and only o=
ne after it.

> +  if (GET_CODE (operands[0]) =3D=3D PARALLEL && XVECLEN (operands[0], 0)=
 =3D=3D 2)
> +    {
> +      rtx arg1 =3D XEXP (XVECEXP (operands[0],0, 0), 0);
> +      rtx arg2 =3D XEXP (XVECEXP (operands[0],0, 1), 0);
> +
> +      emit_call_insn (gen_sibcall_value_multiple_internal (arg1, target,
> +							   operands[2],
> +							   arg2));
> +    }
> +   else
> +    {
> +      /*  Handle return values created by loongarch_return_fpr_single.  =
*/

Should only be one space after =E2=80=9C/*=E2=80=9D here too.

> +      if (GET_CODE (operands[0]) =3D=3D PARALLEL && XVECLEN (operands[0]=
, 0) =3D=3D 1)
> +      operands[0] =3D XEXP (XVECEXP (operands[0], 0, 0), 0);

The last line should be indented by two spaces more.

Same three comments for call_value.

> [=E2=80=A6]
> +;;
> +;;  ....................
> +;;
> +;;	MISC.
> +;;
> +;;  ....................
> +;;
> +
> +(define_insn "nop"
> +  [(const_int 0)]
> +  ""
> +  "nop"
> +  [(set_attr "type"	"nop")
> +   (set_attr "mode"	"none")])

Formatting nit: most of the rest of the file uses a space rather than a
tab after the attribute name.  IMO a space looks nicer.  Same for some
later patterns.

> +
> +;; __builtin_loongarch_movfcsr2gr: move the FCSR into operand 0.
> +(define_insn "loongarch_movfcsr2gr"
> +  [(set (match_operand:SI 0 "register_operand" "=3Dr")
> +    (unspec_volatile:SI [(match_operand 1 "const_uimm5_operand")]
> +    UNSPECV_MOVFCSR2GR))]

Last two lines look under-indented.

> +  "TARGET_HARD_FLOAT"
> +  "movfcsr2gr\t%0,$r%1")
> +
> [=E2=80=A6]
> +(define_insn "stack_tie<mode>"
> +  [(set (mem:BLK (scratch))
> +	(unspec:BLK [(match_operand:GPR 0 "register_operand" "r")
> +		     (match_operand:GPR 1 "register_operand" "r")]
> +		    UNSPEC_TIE))]
> +  ""
> +  ""
> +  [(set_attr "length" "0")]
> +)

Using [(set_attr "type "ghost")] should be slightly better for
scheduling than setting the length directly.

> +
> +(define_insn "gpr_restore_return"
> +  [(return)
> +   (use (match_operand 0 "pmode_register_operand" ""))
> +   (const_int 0)]
> +  ""
> +  "")

Might be worth adding a comment here.  Why does this form of return expand
to no code?

> +
> +(define_split
> +  [(match_operand 0 "small_data_pattern")]
> +  "reload_completed"
> +  [(match_dup 0)]
> +  { operands[0] =3D loongarch_rewrite_small_data (operands[0]); })
> +
> +
> +;; Match paired HI/SI/SF/DFmode load/stores.
> +(define_insn "*join2_load_store<JOIN_MODE:mode>"
> +  [(set (match_operand:JOIN_MODE 0 "nonimmediate_operand"
> +  "=3Dr,f,m,m,r,ZC,r,k,f,k")
> +	(match_operand:JOIN_MODE 1 "nonimmediate_operand" "m,m,r,f,ZC,r,k,r,k,f=
"))
> +   (set (match_operand:JOIN_MODE 2 "nonimmediate_operand"
> +   "=3Dr,f,m,m,r,ZC,r,k,f,k")
> +	(match_operand:JOIN_MODE 3 "nonimmediate_operand" "m,m,r,f,ZC,r,k,r,k,f=
"))]
> +  "reload_completed"
> +  {
> +    bool load_p =3D (which_alternative =3D=3D 0 || which_alternative =3D=
=3D 1);
> +    /* Reg-renaming pass reuses base register if it is dead after bonded=
 loads.
> +       Hardware does not bond those loads, even when they are consecutiv=
e.
> +       However, order of the loads need to be checked for correctness.  =
*/
> +    if (!load_p || !reg_overlap_mentioned_p (operands[0], operands[1]))
> +      {

I'm not sure I understand how these patterns work, but it looks like the
condition above is trying to work around a later change to the insn by
regrename, after peephole2 has checked loongarch_load_store_bonding_p.
If so, you should be able to avoid that by marking the destinations of
the loads as earlyclobbers, using "&r" instead of "r" for the first
alternative.  regrename should then preserve the conditions that
loongarch_load_store_bonding_p checked earlier.

Same for the other patterns.

> +	output_asm_insn (loongarch_output_move (operands[0], operands[1]),
> +			 operands);
> +	output_asm_insn (loongarch_output_move (operands[2], operands[3]),
> +			 &operands[2]);
> +      }
> +    else
> +      {
> +	output_asm_insn (loongarch_output_move (operands[2], operands[3]),
> +			 &operands[2]);
> +	output_asm_insn (loongarch_output_move (operands[0], operands[1]),
> +			 operands);
> +      }
> +    return "";
> +  }
> +  [(set_attr "move_type"
> +  "load,fpload,store,fpstore,load,store,load,store,fpload,fpstore")
> +   (set_attr "insn_count" "2,2,2,2,2,2,2,2,2,2")])
> [=E2=80=A6]
> +;; This is used for indexing into vectors, and hence only accepts const_=
int.
> +(define_predicate "const_0_or_1_operand"
> +  (and (match_code "const_int")
> +       (match_test "IN_RANGE (INTVAL (op), 0, 1)")))

It doesn't look like this is used.  It'd be good to check the other
predicates to see if any of them can be removed.  In particular=E2=80=A6

> [=E2=80=A6]
> +(define_predicate "const_8_to_15_operand"
> +  (and (match_code "const_int")
> +       (match_test "IN_RANGE (INTVAL (op), 0, 7)")))
> +
> +(define_predicate "const_16_to_31_operand"
> +  (and (match_code "const_int")
> +       (match_test "IN_RANGE (INTVAL (op), 0, 7)")))

=E2=80=A6these two don't seem to do what their name suggests.

> [=E2=80=A6]
> +(define_predicate "muldiv_target_operand"
> +		(match_operand 0 "register_operand"))

This one also seems unused.  IMO using register_operand would be clearer.

> [=E2=80=A6]
> +(define_predicate "is_const_call_local_symbol"
> +  (and (match_operand 0 "const_call_insn_operand")
> +       (ior (match_test "loongarch_global_symbol_p (op) =3D=3D 0")
> +       (match_test "loongarch_symbol_binds_local_p (op) !=3D 0"))

The indentation looks misleading here: the last line is another
operand of the ior.

Thanks,
Richard

> +       (match_test "CONSTANT_P (op)")))
> [=E2=80=A6]