From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 117815 invoked by alias); 22 Sep 2018 09:08:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 117777 invoked by uid 89); 22 Sep 2018 09:08:37 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-3.8 required=5.0 tests=AWL,BAYES_20,GIT_PATCH_1,KAM_STOCKGEN,SPF_PASS autolearn=ham version=3.3.2 spammy=Its, lra, compliance, Indeed X-HELO: server28.superhosting.bg Received: from server28.superhosting.bg (HELO server28.superhosting.bg) (217.174.156.11) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 22 Sep 2018 09:08:30 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dinux.eu; s=default; h=Content-Type:Content-Transfer-Encoding:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=20ouyBc/GfVRpLOu3zigrmk+IL5gHlEhTA6OM60sut4=; b=O3dVX2M0y/wCyjlbSoOv79eRrN rYEgED4Q0YU8Be/rbBq2o4lGcxeVpXCgWeFBK+ItmqDNwDn0VBD3MXVMlWSyZaNL+dC041bGlP/OX l7wMEX7B6t2xxpP/kSfrSf4j5l2UuBMs8iwB1JoIh9nQ9f1OiZN5x0a2cIIFhZsFKpj9aipGe/vlC WtVqL1gfkt6PZNdMRCmKToLbi9mLyw/sSod7ZuhHf5RFWy/MPKKHkZUIbP1fBaWjoPFGLbTUZPOrF 0CcB5G+f+aeMz8NP3F02OyFP/krjKYeQ8ozco32tyVOaQMmoYEwajxvL8nrtbTi12OOhkJStGRHSX WXTFzejQ==; Received: from [95.87.234.74] (port=39342 helo=tpdeb.localnet) by server28.superhosting.bg with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.91) (envelope-from ) id 1g3dtl-0000sF-MR; Sat, 22 Sep 2018 12:08:27 +0300 From: Dimitar Dimitrov To: Richard Sandiford Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH v4 01/10] Initial TI PRU GCC port Date: Sat, 22 Sep 2018 10:23:00 -0000 Message-ID: <2626891.mrtP6d4BSD@tpdeb> User-Agent: KMail/5.2.3 (Linux/4.9.0-8-amd64; KDE/5.28.0; x86_64; ; ) In-Reply-To: <87ftydk3le.fsf@arm.com> References: <20180906111217.24365-1-dimitar@dinux.eu> <20180906111217.24365-2-dimitar@dinux.eu> <87ftydk3le.fsf@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-OutGoing-Spam-Status: No, score=0.5 X-IsSubscribed: yes X-SW-Source: 2018-09/txt/msg01281.txt.bz2 On Thursday, 9/13/2018 13:02:21 EEST Richard Sandiford wrote: > Dimitar Dimitrov writes: > > +(define_insn > > "sub_impl_ > _zext_op2>" + [(set (match_operand:EQD 0 "register_operand" "=r,r,r") > > + (minus:EQD > > + (zero_extend:EQD > > + (match_operand:EQS0 1 "reg_or_ubyte_operand" "r,r,I")) > > + (zero_extend:EQD > > + (match_operand:EQS1 2 "reg_or_ubyte_operand" "r,I,r"))))] > > + "" > > + "@ > > + sub\\t%0, %1, %2 > > + sub\\t%0, %1, %2 > > + rsb\\t%0, %2, %1" > > + [(set_attr "type" "alu") > > + (set_attr "length" "4")]) > > By convention, subtraction patterns shouldn't accept constants for > operand 2. Constants should instead be subtracted using an addition > of the negative. Understood. I will remove second alternative. But I will leave the third one since it enables an important optimization: unsigned test(unsigned a) { return 10-a; } RTL: (insn 6 3 7 2 (set (reg:SI 152) (minus:SI (const_int 10 [0xa]) (reg/v:SI 151 [ a ]))) "test.c":430 -1 (nil)) Assembly: test: rsb r14, r14, 10 ret > > > +(define_constraint "I" > > + "An unsigned 8-bit constant." > > + (and (match_code "const_int") > > + (match_test "UBYTE_INT (ival)"))) > > As it stands this will reject QImode constants with the top bit set, > since const_ints are always stored in sign-extended form. E.g. QImode > 128 is stored as (const_int -128) rather than (const_int 128). > > Unfortunately this is difficult to fix in a clean way, since > const_ints don't store their mode (a long-standing wart) and unlike > predicates, constraints aren't given a mode from context. The best > way I can think of coping with it is: > > a) have a separate constraint for -128...127 > b) add a define_mode_attr that maps QI to the new constraint and > HI and SI to I > c) use etc. instead of I in the match_operands > > Similar comment for "J" and HImode, although you already have the > "N" as the corresponding signed constraint and so don't need a new one. Thank you. This strategy worked for QImode. I will include the changes in my next patch. Since PRU ALU operations do not have 16-bit constant operands, there is no need to add define_mode_attr for HImode. The "mov" pattern already handles the "N" [-32768, 32767] constraint. > > > > +;; Return true if OP is a text segment reference. > > +;; This is needed for program memory address expressions. Borrowed from > > AVR. +(define_predicate "text_segment_operand" > > + (match_code "code_label,label_ref,symbol_ref,plus,minus,const") > > +{ > > + switch (GET_CODE (op)) > > + { > > + case CODE_LABEL: > > + return true; > > + case LABEL_REF : > > + return true; > > + case SYMBOL_REF : > > + return SYMBOL_REF_FUNCTION_P (op); > > + case PLUS : > > + case MINUS : > > + /* Assume canonical format of symbol + constant. > > + Fall through. */ > > + case CONST : > > + return text_segment_operand (XEXP (op, 0), VOIDmode); > > + default : > > + return false; > > + } > > +}) > > This probably comes from AVR, but: no spaces before ":". > > Bit surprised that we can get a CODE_LABEL rather than a LABEL_REF here. > Do you know if that triggers in practice, and if so where? Indeed, CODE_LABEL case is never reached. I'll leave gcc_unreachable here. > An IMO neater and slightly more robust way of writing the body is: > poly_int64 offset: > rtx base = strip_offset (op, &offset); > switch (GET_CODE (base)) > > { > > case LABEL_REF: > ...as above... > > case SYMBOL_REF: > ...as above... > > default: > return false; > > } > > with "plus" and "minus" not in the match_code list (since they should > always appear in consts if they really are text references). The "plus" and "minus" are needed when handling code labels as values. Take for example the following construct: int x = &&lab1 - &&lab0; lab1: ... lab2: My TARGET_ASM_INTEGER callback uses the text_segment_operand predicate. In the above case it is passed the following RTL expression: (minus:SI (label_ref/v:SI 20) (label_ref/v:SI 27)) I need to detect text labels so that I annotate them with %pmem: .4byte %pmem(.L4-(.L2)) Instead of the incorrect: .4byte .L3-(.L2) > > +;; Return true if OP is a load multiple operation. It is known to be a > > +;; PARALLEL and the first section will be tested. > > + > > +(define_special_predicate "load_multiple_operation" > > + (match_code "parallel") > > +{ > > + machine_mode elt_mode; > > + int count = XVECLEN (op, 0); > > + unsigned int dest_regno; > > + rtx src_addr; > > + int i, off; > > + > > + /* Perform a quick check so we don't blow up below. */ > > + if (GET_CODE (XVECEXP (op, 0, 0)) != SET > > + || GET_CODE (SET_DEST (XVECEXP (op, 0, 0))) != REG > > + || GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) != MEM) > > + return false; > > + > > + dest_regno = REGNO (SET_DEST (XVECEXP (op, 0, 0))); > > + src_addr = XEXP (SET_SRC (XVECEXP (op, 0, 0)), 0); > > + elt_mode = GET_MODE (SET_DEST (XVECEXP (op, 0, 0))); > > + > > + /* Check, is base, or base + displacement. */ > > + > > + if (GET_CODE (src_addr) == REG) > > + off = 0; > > + else if (GET_CODE (src_addr) == PLUS > > + && GET_CODE (XEXP (src_addr, 0)) == REG > > + && GET_CODE (XEXP (src_addr, 1)) == CONST_INT) > > + { > > + off = INTVAL (XEXP (src_addr, 1)); > > + src_addr = XEXP (src_addr, 0); > > + } > > + else > > + return false; > > + > > + for (i = 1; i < count; i++) > > + { > > + rtx elt = XVECEXP (op, 0, i); > > + > > + if (GET_CODE (elt) != SET > > + || GET_CODE (SET_DEST (elt)) != REG > > + || GET_MODE (SET_DEST (elt)) != elt_mode > > + || REGNO (SET_DEST (elt)) != dest_regno + i > > + || GET_CODE (SET_SRC (elt)) != MEM > > + || GET_MODE (SET_SRC (elt)) != elt_mode > > + || GET_CODE (XEXP (SET_SRC (elt), 0)) != PLUS > > + || ! rtx_equal_p (XEXP (XEXP (SET_SRC (elt), 0), 0), src_addr) > > + || GET_CODE (XEXP (XEXP (SET_SRC (elt), 0), 1)) != CONST_INT > > + || INTVAL (XEXP (XEXP (SET_SRC (elt), 0), 1)) > > + != off + i * GET_MODE_SIZE (elt_mode)) > > + return false; > > + } > > + > > + return true; > > +}) > > This may not matter in practice, but the downside of testing PLUS and > CONST_INT explicitly is that the predicate won't match something like: > > (set (reg R1) (mem (plus (reg RB) (const_int -4)))) > (set (reg R2) (mem (reg RB))) > (set (reg R3) (mem (plus (reg RB) (const_int 4)))) > ... > > Using strip_offset on the addresses would avoid this. It does not matter for PRU since all offsets are unsigned integers. But I will switch to strip_offset to make the code cleaner. > > +/* Callback for walk_gimple_seq that checks TP tree for TI ABI > > compliance. */ +static tree > > +check_op_callback (tree *tp, int *walk_subtrees, void *data) > > +{ > > + struct walk_stmt_info *wi = (struct walk_stmt_info *) data; > > + > > + if (RECORD_OR_UNION_TYPE_P (*tp) || TREE_CODE (*tp) == ENUMERAL_TYPE) > > + { > > + /* Forward declarations have NULL tree type. Skip them. */ > > + if (TREE_TYPE (*tp) == NULL) > > + return NULL; > > + } > > + > > + /* TODO - why C++ leaves INTEGER_TYPE forward declarations around? */ > > + if (TREE_TYPE (*tp) == NULL) > > + return NULL; > > + > > + const tree type = TREE_TYPE (*tp); > > + > > + /* Direct function calls are allowed, obviously. */ > > + if (TREE_CODE (*tp) == ADDR_EXPR && TREE_CODE (type) == POINTER_TYPE) > > + { > > + const tree ptype = TREE_TYPE (type); > > + if (TREE_CODE (ptype) == FUNCTION_TYPE) > > + return NULL; > > + } > > This seems like a bit of a dangerous heuristic. Couldn't it also cause > us to skip things like: > > (void *) func > > ? (OK, that's dubious C, but we do support it.) The cast yields a "data pointer", which is 32 bits for both types of ABI. Hence it is safe to skip "(void *) func". The TI ABI's 16 bit function pointers become a problem when they change the layout of structures and function argument registers. > > +/* Implement TARGET_MODES_TIEABLE_P. */ > > + > > +static bool > > +pru_modes_tieable_p (machine_mode mode1, machine_mode mode2) > > +{ > > + return (mode1 == mode2 > > + || (GET_MODE_SIZE (mode1) <= 4 && GET_MODE_SIZE (mode2) <= 4)); > > +} > > I think this should have a comment explaining why this implementation > is needed. Sorry, this is stale. I will altogether remove the callback, and will use the default "true" implementation. PRU registers are fairly orthogonal. A few years ago I got confused and considered this a requirement for correct 64 bit operations. > > > + { > > + *total = 0; > > + } > > + else > > + { > > + /* SI move has the same cost as a QI move. */ > > + int factor = GET_MODE_SIZE (mode) / GET_MODE_SIZE (SImode); > > + if (factor == 0) > > + factor = 1; > > + *total = factor * COSTS_N_INSNS (1); > > + } > > Could you explain this a bit more? It looks like a pure QImode operation > gets a cost of 1 insn but an SImode operation zero-extended from QImode > gets a cost of 0. I have unintentionally bumped QI cost while trying to make zero-extends cheap. I will fix by using the following simple logic: case SET: mode = GET_MODE (SET_DEST (x)); /* SI move has the same cost as a QI move. Moves larger than 64 bits are costly. */ int factor = CEIL (GET_MODE_SIZE (mode), GET_MODE_SIZE (SImode)); *total = factor * COSTS_N_INSNS (1); > > [...] > > > +/* Implement TARGET_PREFERRED_RELOAD_CLASS. */ > > +static reg_class_t > > +pru_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t regclass) > > +{ > > + return regclass == NO_REGS ? GENERAL_REGS : regclass; > > +} > > This looks odd: PREFERRED_RELOAD_CLASS should return a subset of the > original class rather than add new registers. Can you remember why > it was needed? I'm not sure what target code is supposed to return when NO_REGS is passed here. I saw how other ports handle NO_REGS (c-sky, m32c, nios2, rl78). So I followed suite. Here is a backtrace of LRA when NO_REGS is passed: 0xf629ae pru_preferred_reload_class ../../gcc/gcc/config/pru/pru.c:788 0xa3d6e8 process_alt_operands ../../gcc/gcc/lra-constraints.c:2600 0xa3ef7d curr_insn_transform ../../gcc/gcc/lra-constraints.c:3889 0xa4301e lra_constraints(bool) ../../gcc/gcc/lra-constraints.c:4906 0xa2726c lra(_IO_FILE*) ../../gcc/gcc/lra.c:2446 0x9c97a9 do_reload ../../gcc/gcc/ira.c:5469 0x9c97a9 execute ../../gcc/gcc/ira.c:5653 > > > + /* Floating-point comparisons. */ > > + eqsf_libfunc = init_one_libfunc ("__pruabi_eqf"); > > + nesf_libfunc = init_one_libfunc ("__pruabi_neqf"); > > + lesf_libfunc = init_one_libfunc ("__pruabi_lef"); > > + ltsf_libfunc = init_one_libfunc ("__pruabi_ltf"); > > + gesf_libfunc = init_one_libfunc ("__pruabi_gef"); > > + gtsf_libfunc = init_one_libfunc ("__pruabi_gtf"); > > + eqdf_libfunc = init_one_libfunc ("__pruabi_eqd"); > > + nedf_libfunc = init_one_libfunc ("__pruabi_neqd"); > > + ledf_libfunc = init_one_libfunc ("__pruabi_led"); > > + ltdf_libfunc = init_one_libfunc ("__pruabi_ltd"); > > + gedf_libfunc = init_one_libfunc ("__pruabi_ged"); > > + gtdf_libfunc = init_one_libfunc ("__pruabi_gtd"); > > + > > + set_optab_libfunc (eq_optab, SFmode, NULL); > > + set_optab_libfunc (ne_optab, SFmode, "__pruabi_neqf"); > > + set_optab_libfunc (gt_optab, SFmode, NULL); > > + set_optab_libfunc (ge_optab, SFmode, NULL); > > + set_optab_libfunc (lt_optab, SFmode, NULL); > > + set_optab_libfunc (le_optab, SFmode, NULL); > > + set_optab_libfunc (unord_optab, SFmode, "__pruabi_unordf"); > > + set_optab_libfunc (eq_optab, DFmode, NULL); > > + set_optab_libfunc (ne_optab, DFmode, "__pruabi_neqd"); > > + set_optab_libfunc (gt_optab, DFmode, NULL); > > + set_optab_libfunc (ge_optab, DFmode, NULL); > > + set_optab_libfunc (lt_optab, DFmode, NULL); > > + set_optab_libfunc (le_optab, DFmode, NULL); > > So you need to pass NULL for the ordered comparisons because the ABI > routines return a different value from the one GCC expects? Might be > worth a comment if so. Yes, I will comment. > > > +/* Emit comparison instruction if necessary, returning the expression > > + that holds the compare result in the proper mode. Return the > > comparison + that should be used in the jump insn. */ > > + > > +rtx > > +pru_expand_fp_compare (rtx comparison, machine_mode mode) > > +{ > > + enum rtx_code code = GET_CODE (comparison); > > + rtx op0 = XEXP (comparison, 0); > > + rtx op1 = XEXP (comparison, 1); > > + rtx cmp; > > + enum rtx_code jump_code = code; > > + machine_mode op_mode = GET_MODE (op0); > > + rtx_insn *insns; > > + rtx libfunc; > > + > > + gcc_assert (op_mode == DFmode || op_mode == SFmode); > > + > > + if (code == UNGE) > > + { > > + code = LT; > > + jump_code = EQ; > > + } > > + else if (code == UNLE) > > + { > > + code = GT; > > + jump_code = EQ; > > + } > > This is only safe if the LT and GT libcalls don't raise exceptions for > unordered inputs. I'm guessing they don't, but it's probably worth a > comment if so. Yes, I will comment. > > > +/* Return true if register REGNO is a valid base register. > > + STRICT_P is true if REG_OK_STRICT is in effect. */ > > + > > +bool > > +pru_regno_ok_for_base_p (int regno, bool strict_p) > > +{ > > + if (!HARD_REGISTER_NUM_P (regno)) > > + { > > + if (!strict_p) > > + return true; > > + > > + if (!reg_renumber) > > + return false; > > + > > + regno = reg_renumber[regno]; > > reg_renumber is a reload-only thing and shouldn't be needed/used on > LRA targets. I will remove the snippet. It's a remnant from the PRU's reload days. > > > + if (REG_P (base) > > + && pru_regno_ok_for_base_p (REGNO (base), strict_p) > > + && (offset == NULL_RTX > > + || (CONST_INT_P (offset) > > + && pru_valid_const_ubyte_offset (mode, INTVAL (offset))) > > + || (REG_P (offset) > > + && pru_regno_ok_for_index_p (REGNO (offset), strict_p)))) > > + { > > + /* base register + register offset > > + * OR base register + UBYTE constant offset. */ > > + return true; > > + } > > + else if (REG_P (base) > > + && pru_regno_ok_for_index_p (REGNO (base), strict_p) > > + && (offset != NULL_RTX && ctable_base_operand (offset, VOIDmode))) > > + { > > + /* base CTABLE constant base + register offset > > + * Note: GCC always puts the register as a first operand of PLUS. > > */ + return true; > > + } > > + else if (CONST_INT_P (base) > > + && offset == NULL_RTX > > + && (ctable_addr_operand (base, VOIDmode))) > > + { > > + /* base CTABLE constant base + UBYTE constant offset. */ > > + return true; > > This address should simply be a (const_int ...), with no containing (plus > ...). Yes, my bad. Will remove this snippet. > > + case SUBREG: > > + case MEM: > > + if (letter == 0) > > + { > > + output_address (VOIDmode, op); > > + return; > > + } > > Does the SUBREG case ever hit? They shouldn't appear as operands > at this late stage. No, it does not hit. I will put gcc_unreachable(). > > +/* Helper function to get the starting storage HW register for an > > argument, + or -1 if it must be passed on stack. The cum_v state is > > not changed. */ +static int > > +pru_function_arg_regi (cumulative_args_t cum_v, > > + machine_mode mode, const_tree type, > > + bool named) > > +{ > > + CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v); > > + size_t argsize = pru_function_arg_size (mode, type); > > + size_t i, bi; > > + int regi = -1; > > + > > + if (!pru_arg_in_reg_bysize (argsize)) > > + return -1; > > + > > + if (!named) > > + return -1; > > + > > + /* Find the first available slot that fits. Yes, that's the PRU ABI. > > */ + for (i = 0; regi < 0 && i < ARRAY_SIZE (cum->regs_used); i++) > > + { > > + if (mode == BLKmode) > > + { > > + /* Structs are passed, beginning at a full register. */ > > + if ((i % 4) != 0) > > + continue; > > + } > > The comment doesn't seem to match the code: a struct like: > > struct s { short x; }; > > will have HImode rather than BLKmode. I will update the comment to clarify it is only for large structs. Code should be ok. > > It's usually better to base ABI stuff on the type where possible, > since that corresponds directly to the source language, while modes > are more of an internal GCC thing. I will explore this option. But ABI is already defined in data sizes, which I think neatly fit with GCC's modes. > > > +(define_insn "subdi3" > > + [(set (match_operand:DI 0 "register_operand" "=&r,&r,&r") > > + (minus:DI (match_operand:DI 1 "register_operand" "r,r,I") > > + (match_operand:DI 2 "reg_or_ubyte_operand" "r,I,r")))] > > + "" > > + "@ > > + sub\\t%F0, %F1, %F2\;suc\\t%N0, %N1, %N2 > > + sub\\t%F0, %F1, %2\;suc\\t%N0, %N1, 0 > > + rsb\\t%F0, %F2, %1\;rsc\\t%N0, %N2, 0" > > + [(set_attr "type" "alu,alu,alu") > > + (set_attr "length" "8,8,8")]) > > As above, this shouldn't handle operand 2 being constant; that should > always go through adddi3 instead. I will remove alternative 2, but will leave 3. > > > +; LRA cannot cope with clobbered op2, hence the scratch register. > > +(define_insn "ashr3" > > + [(set (match_operand:QISI 0 "register_operand" "=&r,r") > > + (ashiftrt:QISI > > + (match_operand:QISI 1 "register_operand" "0,r") > > + (match_operand:QISI 2 "reg_or_const_1_operand" "r,P"))) > > + (clobber (match_scratch:QISI 3 "=r,X"))] > > + "" > > + "@ > > + mov %3, %2\;ASHRLP%=:\;qbeq ASHREND%=, %3, 0\; sub %3, %3, 1\; > > lsr\\t%0, %0, 1\; qbbc ASHRLP%=, %0, (%S0 * 8) - 2\; set %0, %0, (%S0 * > > 8) - 1\; jmp ASHRLP%=\;ASHREND%=: + lsr\\t%0, %1, 1\;qbbc LSIGN%=, %0, > > (%S0 * 8) - 2\;set %0, %0, (%S0 * 8) - 1\;LSIGN%=:" + [(set_attr "type" > > "complex,alu") > > + (set_attr "length" "28,4")]) > > What do you mean by LRA not coping? What did you try originally and > what went wrong? Better assembly could be generated if the shift count register was used for a loop counter. Pseudo code: while (op2--) op0 >>= 1; I originally tried to clobber operand 2: (define_insn "ashr3" [(set (match_operand:QISI 0 "register_operand" "=&r,r") (ashiftrt:QISI (match_operand:QISI 1 "register_operand" "0,r") (match_operand:QISI 2 "reg_or_const_1_operand" "+r,P")) )] But with the above pattern operand 2 was not clobbered. Its value was deemed untouched (i.e. live across the pattern). So I came up with clobbering a separate register to fix this, at the expense of slightly bigger generated code. > > > +(define_insn "di3" > > + [(set (match_operand:DI 0 "register_operand" "=&r,&r") > > + (LOGICAL_BITOP:DI > > + (match_operand:DI 1 "register_operand" "%r,r") > > + (match_operand:DI 2 "reg_or_ubyte_operand" "r,I")))] > > + "" > > + "@ > > + \\t%F0, %F1, %F2\;\\t%N0, %N1, > > %N2 + \\t%F0, %F1, %2\;\\t%N0, > > %N1, 0" + [(set_attr "type" "alu,alu") > > + (set_attr "length" "8,8")]) > > + > > + > > +(define_insn "one_cmpldi2" > > + [(set (match_operand:DI 0 "register_operand" "=r") > > + (not:DI (match_operand:DI 1 "register_operand" "r")))] > > + "" > > +{ > > + /* careful with overlapping source and destination regs. */ > > + gcc_assert (GP_REG_P (REGNO (operands[0]))); > > + gcc_assert (GP_REG_P (REGNO (operands[1]))); > > + if (REGNO (operands[0]) == (REGNO (operands[1]) + 4)) > > + return "not\\t%N0, %N1\;not\\t%F0, %F1"; > > + else > > + return "not\\t%F0, %F1\;not\\t%N0, %N1"; > > +} > > + [(set_attr "type" "alu") > > + (set_attr "length" "8")]) > > Do you see any code improvements from defining these patterns? > These days I'd expect you'd get better code by letting the > target-independent code split them up into SImode operations. Yes, these patterns improve code size and speed. Looking at expand_binop(), DI logical ops are split into WORD mode, which in PRU's peculiar case is QI. So without those patterns, a 64 bit IOR generates 8 QI instructions: or r0.b0, r14.b0, r16.b0 or r0.b1, r14.b1, r16.b1 or r0.b2, r14.b2, r16.b2 or r0.b3, r14.b3, r16.b3 or r1.b0, r15.b0, r17.b0 or r1.b1, r15.b1, r17.b1 or r1.b2, r15.b2, r17.b2 or r1.b3, r15.b3, r17.b3 Whereas with patterns defined, it is only 2 SImode instructions: or r0, r14, r16 or r1, r15, r17 > > > +;; Multiply instruction. Idea for fixing registers comes from the AVR > > backend. + > > +(define_expand "mulsi3" > > + [(set (match_operand:SI 0 "register_operand" "") > > + (mult:SI (match_operand:SI 1 "register_operand" "") > > + (match_operand:SI 2 "register_operand" "")))] > > + "" > > +{ > > + emit_insn (gen_mulsi3_fixinp (operands[0], operands[1], operands[2])); > > + DONE; > > +}) > > + > > + > > +(define_expand "mulsi3_fixinp" > > + [(set (reg:SI 112) (match_operand:SI 1 "register_operand" "")) > > + (set (reg:SI 116) (match_operand:SI 2 "register_operand" "")) > > + (set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116))) > > + (set (match_operand:SI 0 "register_operand" "") (reg:SI 104))] > > + "" > > +{ > > +}) > > This seems slightly dangerous since there's nothing to guarantee that > those registers aren't already live at the point of expansion. > > The more usual way (and IMO correct way) would be for: > > +(define_insn "*mulsi3_prumac" > > + [(set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116)))] > > + "" > > + "nop\;xin\\t0, r26, 4" > > + [(set_attr "type" "alu") > > + (set_attr "length" "8")]) > > to have register classes and constraints for these three registers, > like e.g. the x86 port does for %cx etc. > > It would be good if you could try that. On the other hand, if AVR > already does this and it worked in practice then I guess it's not > something that should hold up the port. This suggestion worked. It was a matter of correct predicates. Here is what I will put in the next patch version: (define_predicate "pru_muldst_operand" (and (match_code "reg") (ior (match_test "REGNO_REG_CLASS (REGNO (op)) == MULDST_REGS") (match_test "REGNO (op) >= FIRST_PSEUDO_REGISTER")))) ... (define_register_constraint "Rmd0" "MULDST_REGS" "@internal The multiply destination register.") ... (define_insn "mulsi3" [(set (match_operand:SI 0 "pru_muldst_operand" "=Rmd0") (mult:SI (match_operand:SI 1 "pru_mulsrc0_operand" "Rms0") (match_operand:SI 2 "pru_mulsrc1_operand" "Rms1")))] > > > @@ -23444,6 +23449,56 @@ the offset with a symbol reference to a canary in > > the TLS block.> > > @end table > > > > +@node PRU Options > > +@subsection PRU Options > > +@cindex PRU Options > > + > > +These command-line options are defined for PRU target: > > + > > +@table @gcctabopt > > +@item -minrt > > +@opindex minrt > > +Enable the use of a minimum runtime environment---no static > > +initializers or constructors. Results in significant code size > > +reduction of final ELF binaries. > > Up to you, but this read to me as "-m"+"inrt". If this option is already > in use then obviously keep it, but if it's new then it might be worth > calling it -mminrt instead, for consistency with -mmcu. I assumed this is a standard naming since MSP430 already uses it. I've seen the option being utilized by pru-gcc users, so let's keep it that way. > > +@item T > > +A text segment (program memory) constant label. > > + > > +@item Z > > +Integer constant zero. > > Just checking, but did you deliberately leave out N? That's fine if so, > but the docstring in constraints.md didn't have an "@internal" marker to > indicate it was internal-only. I did not know about "@internal. I will annotate appropriately. > > Despite the long reply (sorry), this looks in really good shape to me FWIW. > Thanks for the submission. I take this sentence as an indication that the PRU port is welcome for inclusion into GCC. I'll work on fixing the comments and will send a new patch version. Thank you for the in-depth review and valuable suggestions. Regards, Dimitar