From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <newlib-return-17001-listarch-newlib=sources.redhat.com@sourceware.org>
Received: (qmail 73777 invoked by alias); 6 Jul 2019 03:26:38 -0000
Mailing-List: contact newlib-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <newlib.sourceware.org>
List-Subscribe: <mailto:newlib-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/newlib/>
List-Post: <mailto:newlib@sourceware.org>
List-Help: <mailto:newlib-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: newlib-owner@sourceware.org
Received: (qmail 73766 invoked by uid 89); 6 Jul 2019 03:26:37 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=sk:newlib-, Fortune, inspection, limits
X-HELO: NAM04-BN3-obe.outbound.protection.outlook.com
Received: from mail-eopbgr680124.outbound.protection.outlook.com (HELO NAM04-BN3-obe.outbound.protection.outlook.com) (40.107.68.124) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 06 Jul 2019 03:26:33 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wavesemi.onmicrosoft.com; s=selector1-wavesemi-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ro7h6zT0MX32DqxLTBiUYa3p++A4h66XZgz02BAq4Kw=; b=dzRD3yqJhEb35+ONyaPJ5p5hX4fT8GKZ+zvL0pstbC+yGZV/MI7xdrPcN6D9kYYtMuvZhKDk8V78ra5Fjg78isoUI6FnhCZg2i3WTfIshSmg+s3Y9ytXpcLWKjK4QzlQUhhFR7xTpYmJ+H537iizCPJLRPxxne2qAoTkmSVZMD8=
Received: from DM5PR22MB0683.namprd22.prod.outlook.com (10.172.190.23) by DM5PR22MB0265.namprd22.prod.outlook.com (10.173.174.137) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2052.19; Sat, 6 Jul 2019 03:26:28 +0000
Received: from DM5PR22MB0683.namprd22.prod.outlook.com ([fe80::55a1:e1a6:d49a:918b]) by DM5PR22MB0683.namprd22.prod.outlook.com ([fe80::55a1:e1a6:d49a:918b%6]) with mapi id 15.20.2052.019; Sat, 6 Jul 2019 03:26:28 +0000
From: Faraz Shahbazker <fshahbazker@wavecomp.com>
To: Richard Sandiford <richard.sandiford@arm.com>, Matthew Fortune	<Matthew.Fortune@imgtec.com>
CC: "newlib@sourceware.org" <newlib@sourceware.org>
Subject: Re: [PATCH,MIPS 2/3] Enable reorder for crt0.S
Date: Sat, 06 Jul 2019 03:26:00 -0000
Message-ID: <DM5PR22MB0683DAB7A7F54EA2407F9E9BD0F40@DM5PR22MB0683.namprd22.prod.outlook.com>
References: <6D39441BF12EF246A7ABCE6654B0235320F73501@LEMAIL01.le.imgtec.org>,<mpt36jl1zoa.fsf@arm.com>
In-Reply-To: <mpt36jl1zoa.fsf@arm.com>
authentication-results: spf=none (sender IP is ) smtp.mailfrom=fshahbazker@wavecomp.com;
x-ms-exchange-purlcount: 1
x-ms-oob-tlc-oobclassifiers: OLM:8273;
received-spf: None (protection.outlook.com: wavecomp.com does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: fshahbazker@wavecomp.com
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2019/txt/msg00293.txt.bz2

Hi Richard,


Looks like an oversight. From an internal branch of that vintage, I can see=
 that the alignment was intended, but only committed upstream for mti*.ld s=
cripts and that too snuck in accidentally as part of an unrelated commit (s=
ee https://sourceware.org/ml/newlib-cvs/2014-q4/msg00025.html). Probably a =
mistake in splitting patches for submission?


I am not familiar will all these target boards. Okay to go ahead and fix al=
l MIPS scripts?


Regards,

Faraz

________________________________
From: newlib-owner@sourceware.org <newlib-owner@sourceware.org> on behalf o=
f Richard Sandiford <richard.sandiford@arm.com>
Sent: Thursday, July 4, 2019 10:23:49 AM
To: Matthew Fortune
Cc: newlib@sourceware.org
Subject: Re: [PATCH,MIPS 2/3] Enable reorder for crt0.S

Hi,

Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
> Hi,
>
> As part of a long term plan to reduce the amount of hand written .set nor=
eorder
> code, I have reworked the crt0.S file so that the assembler can fill delay
> slots instead of them being explicitly filled.  The reason for doing this=
 is
> to enable future auto-conversion of delay slot branches to 'compact' bran=
ches
> present in the R6 architecture. Auto-conversion is not possible in a .set
> noreorder block as any delay slot branch with a non-NOP delay slot would =
have
> to be reordered!!! to convert to a compact branch without a delay slot.
> Writing code in a natural linear order is (subjectively) also much simple=
r to
> digest and maintain.
>
> One ugly piece of code had to be reworked in the zerobss loop which was u=
sing
> a pseudo-instruction BLTU as the branch. The structure of the old-loop was
> clearly aiming to produce a tight loop with one instruction and the delay=
 slot
> filled but the expansion of the BLTU would have undone this anyway. This =
has
> been reworked to create the kind of loop originally intended and have the
> assembler fill the delay slot. The precise behaviour of the loop is subtly
> different from before for two reasons:
>
> 1) When the _fbss and _end symbols have the same value then the old loop =
would
>    have written zero to every address from _fbss to the end of memory (or=
 an
>    exception occurred). The new loop is skipped if the two symbols are th=
e same.
> 2) The old loop wrote zero to address of _end which is past the end of the
>    bss range. The new loop does not do this.
> 3) When _fbss is greater then _end at the start then the old loop would h=
ave
>    written one element and exited. The new loop will attempt to write zero
>    to every address from _fbss to the end of memory, wrap and continue to
>    _end (or hit an exception). This change in behaviour is fine as the
>    scenario is invalid anyway.
> 4) The _end marker is now aligned to 4-bytes to ensure that the last elem=
ent
>    written does not reach beyond the address of _end. This is also necess=
ary
>    as the termination condition is an equality test instead of an ordered
>    test so (_end - _fbss) must be a multiple of 4-bytes.

Sorry to jump on this old patch, but I couldn't see anything that did
(4).  I was trying to test mipsisa64-elf with idt64.ld and many tests
end up with an _end that isn't four-byte aligned, leading to an
"infinite" zeroing loop.

I guess we should add:

  . =3D ALIGN(4);

in front of:

   PROVIDE (end =3D .);
   _end =3D .;

for each script that doesn't already align to 4 or beyond.

Thanks,
Richard

>
> Delay slot filling will occur when libgloss is built with GCC and an
> optimisation level greater than zero. This gets translated to an assembler
> optimisation level of '2'.
>
> All instance of JAL <reg> have been changed to JALR <reg> as there is no
> special handling in the JAL macro in binutils for a register operand and
> JALR is the real underlying instruction.
>
> This change is primarily verified by code inspection but has also been run
> through some small test programs.
>
> Thanks,
> Matthew
>
> libgloss/
>
>        * mips/crt0.S: Remove .set noreorder throughout.  Change JAL <reg>=
 to
>        JALR <reg> throughout.
>        (zerobss): Open code the bltu macro instruction so that the
>        zero-loop does not have a NOP in the branch delay slot.
> ---
>  libgloss/mips/crt0.S | 53 ++++++++++++++++++----------------------------=
------
>  1 file changed, 18 insertions(+), 35 deletions(-)
>
> diff --git a/libgloss/mips/crt0.S b/libgloss/mips/crt0.S
> index 599e79c..f66ef1b 100644
> --- a/libgloss/mips/crt0.S
> +++ b/libgloss/mips/crt0.S
> @@ -57,13 +57,14 @@
>        .globl  _start
>        .ent    _start
>  _start:
> -     .set    noreorder
>  #ifdef __mips_embedded_pic
>  #define PICBASE start_PICBASE
> +     .set    noreorder
>        PICBASE =3D .+8
>          bal  PICBASE
>        nop
>        move    s0,$31
> +     .set    reorder
>  #endif
>  #if __mips<3
>  #  define STATUS_MASK (SR_CU1|SR_PE)
> @@ -89,9 +90,7 @@ _start:
>        /* Avoid hazard from FPU enable and other SR changes.  */
>        LA (t0, hardware_hazard_hook)
>        beq     t0,zero,1f
> -     nop
> -     jal     t0
> -     nop
> +     jalr    t0
>  1:
>
>  /* Check for FPU presence.  Don't check if we know that soft_float is
> @@ -105,11 +104,8 @@ _start:
>        mfc1    t1,fp1
>        nop
>        bne     t0,t2,1f        /* check for match */
> -     nop
>        bne     t1,zero,1f      /* double check */
> -     nop
>        j       2f              /* FPU is present. */
> -     nop
>  #endif
>  1:
>        /* FPU is not present.  Set status register to say that. */
> @@ -119,9 +115,7 @@ _start:
>        /* Avoid hazard from FPU disable.  */
>        LA (t0, hardware_hazard_hook)
>        beq     t0,zero,2f
> -     nop
> -     jal     t0
> -     nop
> +     jalr    t0
>  2:
>
>
> @@ -129,7 +123,6 @@ _start:
>     doesn't get confused.  */
>        LA (v0, 3f)
>        jr      v0
> -     nop
>  3:
>        LA (gp, _gp)                            # set the global data poin=
ter
>        .end _start
> @@ -145,21 +138,20 @@ _start:
>  zerobss:
>        LA (v0, _fbss)
>        LA (v1, _end)
> -3:
> -     sw      zero,0(v0)
> -     bltu    v0,v1,3b
> -     addiu   v0,v0,4                         # executed in delay slot
> -
> +     beq     v0,v1,2f
> +1:
> +     addiu   v0,v0,4
> +     sw      zero,-4(v0)
> +     bne     v0,v1,1b
> +2:
>        la      t0, __lstack                    # make a small stack so we
>        addiu   sp, t0, STARTUP_STACK_SIZE      # can run some C code
>        la      a0, __memsize                   # get the usable memory si=
ze
>        jal     get_mem_info
> -     nop
>
>        /* setup the stack pointer */
>        LA (t0, __stack)                        # is __stack set ?
>        bne     t0,zero,4f
> -     nop
>
>        /* NOTE: a0[0] contains the amount of memory available, and
>                 not the last memory address. */
> @@ -189,19 +181,14 @@ zerobss:
>  init:
>        LA (t9, hardware_init_hook)             # init the hardware if nee=
ded
>        beq     t9,zero,6f
> -     nop
> -     jal     t9
> -     nop
> +     jalr    t9
>  6:
>        LA (t9, software_init_hook)             # init the hardware if nee=
ded
>        beq     t9,zero,7f
> -     nop
> -     jal     t9
> -     nop
> +     jalr    t9
>  7:
>        LA (a0, _fini)
>        jal     atexit
> -     nop
>
>  #ifdef GCRT0
>        .globl  _ftext
> @@ -209,12 +196,10 @@ init:
>        LA (a0, _ftext)
>        LA (a1, _etext)
>        jal     monstartup
> -     nop
>  #endif
>
>
>        jal     _init                           # run global constructors
> -     nop
>
>        addiu   a1,sp,32                        # argv =3D sp + 32
>        addiu   a2,sp,40                        # envp =3D sp + 40
> @@ -225,13 +210,13 @@ init:
>        sw      zero,(a1)
>        sw      zero,(a2)
>  #endif
> -     jal     main                            # call the program start fu=
nction
>        move    a0,zero                         # set argc to 0
> +     jal     main                            # call the program start fu=
nction
>
>        # fall through to the "exit" routine
> +     move    a0,v0                           # pass through the exit code
>        jal     exit                            # call libc exit to run th=
e G++
>                                                # destructors
> -     move    a0,v0                           # pass through the exit code
>        .end    init
>
>
> @@ -257,27 +242,25 @@ _exit:
>        /* Need to reinit PICBASE, since we might be called via exit()
>           rather than via a return path which would restore old s0.  */
>  #define PICBASE exit_PICBASE
> +     .set    noreorder
>        PICBASE =3D .+8
>        bal     PICBASE
>        nop
>        move    s0,$31
> +     .set    reorder
>  #endif
>  #ifdef GCRT0
>        LA (t0, _mcleanup)
> -     jal     t0
> -     nop
> +     jalr    t0
>  #endif
>        LA (t0, hardware_exit_hook)
>        beq     t0,zero,1f
> -     nop
> -     jal     t0
> -     nop
> +     jalr    t0
>  1:
>
>        # break instruction can cope with 0xfffff, but GAS limits the rang=
e:
>        break   1023
>        b       7b                              # but loop back just in-ca=
se
> -     nop
>        .end _exit
>
>  /* Assume the PICBASE set up above is no longer valid below here.  */