From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 73777 invoked by alias); 6 Jul 2019 03:26:38 -0000 Mailing-List: contact newlib-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: newlib-owner@sourceware.org Received: (qmail 73766 invoked by uid 89); 6 Jul 2019 03:26:37 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=sk:newlib-, Fortune, inspection, limits X-HELO: NAM04-BN3-obe.outbound.protection.outlook.com Received: from mail-eopbgr680124.outbound.protection.outlook.com (HELO NAM04-BN3-obe.outbound.protection.outlook.com) (40.107.68.124) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 06 Jul 2019 03:26:33 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wavesemi.onmicrosoft.com; s=selector1-wavesemi-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ro7h6zT0MX32DqxLTBiUYa3p++A4h66XZgz02BAq4Kw=; b=dzRD3yqJhEb35+ONyaPJ5p5hX4fT8GKZ+zvL0pstbC+yGZV/MI7xdrPcN6D9kYYtMuvZhKDk8V78ra5Fjg78isoUI6FnhCZg2i3WTfIshSmg+s3Y9ytXpcLWKjK4QzlQUhhFR7xTpYmJ+H537iizCPJLRPxxne2qAoTkmSVZMD8= Received: from DM5PR22MB0683.namprd22.prod.outlook.com (10.172.190.23) by DM5PR22MB0265.namprd22.prod.outlook.com (10.173.174.137) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2052.19; Sat, 6 Jul 2019 03:26:28 +0000 Received: from DM5PR22MB0683.namprd22.prod.outlook.com ([fe80::55a1:e1a6:d49a:918b]) by DM5PR22MB0683.namprd22.prod.outlook.com ([fe80::55a1:e1a6:d49a:918b%6]) with mapi id 15.20.2052.019; Sat, 6 Jul 2019 03:26:28 +0000 From: Faraz Shahbazker To: Richard Sandiford , Matthew Fortune CC: "newlib@sourceware.org" Subject: Re: [PATCH,MIPS 2/3] Enable reorder for crt0.S Date: Sat, 06 Jul 2019 03:26:00 -0000 Message-ID: References: <6D39441BF12EF246A7ABCE6654B0235320F73501@LEMAIL01.le.imgtec.org>, In-Reply-To: authentication-results: spf=none (sender IP is ) smtp.mailfrom=fshahbazker@wavecomp.com; x-ms-exchange-purlcount: 1 x-ms-oob-tlc-oobclassifiers: OLM:8273; received-spf: None (protection.outlook.com: wavecomp.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 MIME-Version: 1.0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: fshahbazker@wavecomp.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2019/txt/msg00293.txt.bz2 Hi Richard, Looks like an oversight. From an internal branch of that vintage, I can see= that the alignment was intended, but only committed upstream for mti*.ld s= cripts and that too snuck in accidentally as part of an unrelated commit (s= ee https://sourceware.org/ml/newlib-cvs/2014-q4/msg00025.html). Probably a = mistake in splitting patches for submission? I am not familiar will all these target boards. Okay to go ahead and fix al= l MIPS scripts? Regards, Faraz ________________________________ From: newlib-owner@sourceware.org on behalf o= f Richard Sandiford Sent: Thursday, July 4, 2019 10:23:49 AM To: Matthew Fortune Cc: newlib@sourceware.org Subject: Re: [PATCH,MIPS 2/3] Enable reorder for crt0.S Hi, Matthew Fortune writes: > Hi, > > As part of a long term plan to reduce the amount of hand written .set nor= eorder > code, I have reworked the crt0.S file so that the assembler can fill delay > slots instead of them being explicitly filled. The reason for doing this= is > to enable future auto-conversion of delay slot branches to 'compact' bran= ches > present in the R6 architecture. Auto-conversion is not possible in a .set > noreorder block as any delay slot branch with a non-NOP delay slot would = have > to be reordered!!! to convert to a compact branch without a delay slot. > Writing code in a natural linear order is (subjectively) also much simple= r to > digest and maintain. > > One ugly piece of code had to be reworked in the zerobss loop which was u= sing > a pseudo-instruction BLTU as the branch. The structure of the old-loop was > clearly aiming to produce a tight loop with one instruction and the delay= slot > filled but the expansion of the BLTU would have undone this anyway. This = has > been reworked to create the kind of loop originally intended and have the > assembler fill the delay slot. The precise behaviour of the loop is subtly > different from before for two reasons: > > 1) When the _fbss and _end symbols have the same value then the old loop = would > have written zero to every address from _fbss to the end of memory (or= an > exception occurred). The new loop is skipped if the two symbols are th= e same. > 2) The old loop wrote zero to address of _end which is past the end of the > bss range. The new loop does not do this. > 3) When _fbss is greater then _end at the start then the old loop would h= ave > written one element and exited. The new loop will attempt to write zero > to every address from _fbss to the end of memory, wrap and continue to > _end (or hit an exception). This change in behaviour is fine as the > scenario is invalid anyway. > 4) The _end marker is now aligned to 4-bytes to ensure that the last elem= ent > written does not reach beyond the address of _end. This is also necess= ary > as the termination condition is an equality test instead of an ordered > test so (_end - _fbss) must be a multiple of 4-bytes. Sorry to jump on this old patch, but I couldn't see anything that did (4). I was trying to test mipsisa64-elf with idt64.ld and many tests end up with an _end that isn't four-byte aligned, leading to an "infinite" zeroing loop. I guess we should add: . =3D ALIGN(4); in front of: PROVIDE (end =3D .); _end =3D .; for each script that doesn't already align to 4 or beyond. Thanks, Richard > > Delay slot filling will occur when libgloss is built with GCC and an > optimisation level greater than zero. This gets translated to an assembler > optimisation level of '2'. > > All instance of JAL have been changed to JALR as there is no > special handling in the JAL macro in binutils for a register operand and > JALR is the real underlying instruction. > > This change is primarily verified by code inspection but has also been run > through some small test programs. > > Thanks, > Matthew > > libgloss/ > > * mips/crt0.S: Remove .set noreorder throughout. Change JAL = to > JALR throughout. > (zerobss): Open code the bltu macro instruction so that the > zero-loop does not have a NOP in the branch delay slot. > --- > libgloss/mips/crt0.S | 53 ++++++++++++++++++----------------------------= ------ > 1 file changed, 18 insertions(+), 35 deletions(-) > > diff --git a/libgloss/mips/crt0.S b/libgloss/mips/crt0.S > index 599e79c..f66ef1b 100644 > --- a/libgloss/mips/crt0.S > +++ b/libgloss/mips/crt0.S > @@ -57,13 +57,14 @@ > .globl _start > .ent _start > _start: > - .set noreorder > #ifdef __mips_embedded_pic > #define PICBASE start_PICBASE > + .set noreorder > PICBASE =3D .+8 > bal PICBASE > nop > move s0,$31 > + .set reorder > #endif > #if __mips<3 > # define STATUS_MASK (SR_CU1|SR_PE) > @@ -89,9 +90,7 @@ _start: > /* Avoid hazard from FPU enable and other SR changes. */ > LA (t0, hardware_hazard_hook) > beq t0,zero,1f > - nop > - jal t0 > - nop > + jalr t0 > 1: > > /* Check for FPU presence. Don't check if we know that soft_float is > @@ -105,11 +104,8 @@ _start: > mfc1 t1,fp1 > nop > bne t0,t2,1f /* check for match */ > - nop > bne t1,zero,1f /* double check */ > - nop > j 2f /* FPU is present. */ > - nop > #endif > 1: > /* FPU is not present. Set status register to say that. */ > @@ -119,9 +115,7 @@ _start: > /* Avoid hazard from FPU disable. */ > LA (t0, hardware_hazard_hook) > beq t0,zero,2f > - nop > - jal t0 > - nop > + jalr t0 > 2: > > > @@ -129,7 +123,6 @@ _start: > doesn't get confused. */ > LA (v0, 3f) > jr v0 > - nop > 3: > LA (gp, _gp) # set the global data poin= ter > .end _start > @@ -145,21 +138,20 @@ _start: > zerobss: > LA (v0, _fbss) > LA (v1, _end) > -3: > - sw zero,0(v0) > - bltu v0,v1,3b > - addiu v0,v0,4 # executed in delay slot > - > + beq v0,v1,2f > +1: > + addiu v0,v0,4 > + sw zero,-4(v0) > + bne v0,v1,1b > +2: > la t0, __lstack # make a small stack so we > addiu sp, t0, STARTUP_STACK_SIZE # can run some C code > la a0, __memsize # get the usable memory si= ze > jal get_mem_info > - nop > > /* setup the stack pointer */ > LA (t0, __stack) # is __stack set ? > bne t0,zero,4f > - nop > > /* NOTE: a0[0] contains the amount of memory available, and > not the last memory address. */ > @@ -189,19 +181,14 @@ zerobss: > init: > LA (t9, hardware_init_hook) # init the hardware if nee= ded > beq t9,zero,6f > - nop > - jal t9 > - nop > + jalr t9 > 6: > LA (t9, software_init_hook) # init the hardware if nee= ded > beq t9,zero,7f > - nop > - jal t9 > - nop > + jalr t9 > 7: > LA (a0, _fini) > jal atexit > - nop > > #ifdef GCRT0 > .globl _ftext > @@ -209,12 +196,10 @@ init: > LA (a0, _ftext) > LA (a1, _etext) > jal monstartup > - nop > #endif > > > jal _init # run global constructors > - nop > > addiu a1,sp,32 # argv =3D sp + 32 > addiu a2,sp,40 # envp =3D sp + 40 > @@ -225,13 +210,13 @@ init: > sw zero,(a1) > sw zero,(a2) > #endif > - jal main # call the program start fu= nction > move a0,zero # set argc to 0 > + jal main # call the program start fu= nction > > # fall through to the "exit" routine > + move a0,v0 # pass through the exit code > jal exit # call libc exit to run th= e G++ > # destructors > - move a0,v0 # pass through the exit code > .end init > > > @@ -257,27 +242,25 @@ _exit: > /* Need to reinit PICBASE, since we might be called via exit() > rather than via a return path which would restore old s0. */ > #define PICBASE exit_PICBASE > + .set noreorder > PICBASE =3D .+8 > bal PICBASE > nop > move s0,$31 > + .set reorder > #endif > #ifdef GCRT0 > LA (t0, _mcleanup) > - jal t0 > - nop > + jalr t0 > #endif > LA (t0, hardware_exit_hook) > beq t0,zero,1f > - nop > - jal t0 > - nop > + jalr t0 > 1: > > # break instruction can cope with 0xfffff, but GAS limits the rang= e: > break 1023 > b 7b # but loop back just in-ca= se > - nop > .end _exit > > /* Assume the PICBASE set up above is no longer valid below here. */