[PATCH 0/5] x86: further GOT{,PCREL} related adjustments

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 0/5] x86: further GOT{,PCREL} related adjustments
@ 2025-02-03 11:39 Jan Beulich
  2025-02-03 11:40 ` [PATCH 1/5] x86: drop redundant i.operands checks from output_disp() Jan Beulich
                   ` (4 more replies)
  0 siblings, 5 replies; 32+ messages in thread
From: Jan Beulich @ 2025-02-03 11:39 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

Following on from "x86-64: enhancements to GOTPCREL / GOTTPOFF handling".

1: drop redundant i.operands checks from output_disp()
2: tighten convert-load-reloc checking
3: widen @got{,pcrel} support to PUSH and APX IMUL
4: further tighten convert-load-reloc checking
5: restrict use of GOT32X relocs

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/5] x86: drop redundant i.operands checks from output_disp()
  2025-02-03 11:39 [PATCH 0/5] x86: further GOT{,PCREL} related adjustments Jan Beulich
@ 2025-02-03 11:40 ` Jan Beulich
  2025-02-03 11:40 ` [PATCH 2/5] ix86: tighten convert-load-reloc checking Jan Beulich
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2025-02-03 11:40 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

The opcode space, major opcode, and - where applicable - opcode
extension checks fully qualify the insns we're after; operand matching
has been done far earlier, so wrong operand counts cannot occur here.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -12939,17 +12939,14 @@ output_disp (fragS *insn_start_frag, off
 		  && (i.rm.mode == 2
 		      || (i.rm.mode == 0 && i.rm.regmem == 5))
 		  && ((space == SPACE_BASE
-		       && i.operands == 1
 		       && i.tm.base_opcode == 0xff
 		       && (i.rm.reg == 2 || i.rm.reg == 4))
 		      || ((space == SPACE_BASE
 			   || space == SPACE_0F38
 			   || space == SPACE_MAP4)
-			  && i.operands == 2
 			  && i.tm.base_opcode == 0x8b)
 		      || ((space == SPACE_BASE
 			   || space == SPACE_MAP4)
-			  && i.operands >= 2
 			  && (i.tm.base_opcode == 0x85
 			      || (i.tm.base_opcode
 				  | (i.operands > 2 ? 0x3a : 0x38)) == 0x3b))))


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/5] ix86: tighten convert-load-reloc checking
  2025-02-03 11:39 [PATCH 0/5] x86: further GOT{,PCREL} related adjustments Jan Beulich
  2025-02-03 11:40 ` [PATCH 1/5] x86: drop redundant i.operands checks from output_disp() Jan Beulich
@ 2025-02-03 11:40 ` Jan Beulich
  2025-02-03 22:34   ` H.J. Lu
  2025-02-03 11:41 ` [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL Jan Beulich
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2025-02-03 11:40 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

Just like was done recently for x86-64 (commit 4998f9ea9d35): Even if
the assembler avoids using the relaxable relocation for inapplicable
insns, the relocation type can still appear for other reasons. Be more
thorough in the opcode checking we do, to avoid bogusly altering other
insns.

Furthermore correct an opcode mask (even if with the added condition
that's now fully benign).

--- a/bfd/elf32-i386.c
+++ b/bfd/elf32-i386.c
@@ -1453,15 +1453,16 @@ elf_i386_convert_load_reloc (bfd *abfd,
 		  modrm = 0xc0 | (modrm & 0x38) >> 3;
 		  opcode = 0xf7;
 		}
-	      else
+	      else if ((opcode | 0x38) == 0x3b)
 		{
 		  /* Convert "binop foo@GOT(%reg1), %reg2" to
 		     "binop $foo, %reg2".  */
-		  modrm = (0xc0
-			   | (modrm & 0x38) >> 3
-			   | (opcode & 0x3c));
+		  modrm = 0xc0 | ((modrm & 0x38) >> 3) | (opcode & 0x38);
 		  opcode = 0x81;
 		}
+	      else
+		return true;
+
 	      bfd_put_8 (abfd, modrm, contents + roff - 1);
 	      r_type = R_386_32;
 	    }


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/5] ix86: tighten convert-load-reloc checking
  2025-02-03 11:40 ` [PATCH 2/5] ix86: tighten convert-load-reloc checking Jan Beulich
@ 2025-02-03 22:34   ` H.J. Lu
  2025-02-04  7:21     ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: H.J. Lu @ 2025-02-03 22:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Mon, Feb 3, 2025 at 7:40 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> Just like was done recently for x86-64 (commit 4998f9ea9d35): Even if
> the assembler avoids using the relaxable relocation for inapplicable
> insns, the relocation type can still appear for other reasons. Be more
> thorough in the opcode checking we do, to avoid bogusly altering other
> insns.
>
> Furthermore correct an opcode mask (even if with the added condition
> that's now fully benign).
>
> --- a/bfd/elf32-i386.c
> +++ b/bfd/elf32-i386.c
> @@ -1453,15 +1453,16 @@ elf_i386_convert_load_reloc (bfd *abfd,
>                   modrm = 0xc0 | (modrm & 0x38) >> 3;
>                   opcode = 0xf7;
>                 }
> -             else
> +             else if ((opcode | 0x38) == 0x3b)
>                 {
>                   /* Convert "binop foo@GOT(%reg1), %reg2" to
>                      "binop $foo, %reg2".  */
> -                 modrm = (0xc0
> -                          | (modrm & 0x38) >> 3
> -                          | (opcode & 0x3c));
> +                 modrm = 0xc0 | ((modrm & 0x38) >> 3) | (opcode & 0x38);
>                   opcode = 0x81;
>                 }
> +             else
> +               return true;
> +
>               bfd_put_8 (abfd, modrm, contents + roff - 1);
>               r_type = R_386_32;
>             }
>

Please add a testcase to show it makes a difference.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/5] ix86: tighten convert-load-reloc checking
  2025-02-03 22:34   ` H.J. Lu
@ 2025-02-04  7:21     ` Jan Beulich
  2025-02-04  7:26       ` H.J. Lu
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2025-02-04  7:21 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 03.02.2025 23:34, H.J. Lu wrote:
> On Mon, Feb 3, 2025 at 7:40 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> Just like was done recently for x86-64 (commit 4998f9ea9d35): Even if
>> the assembler avoids using the relaxable relocation for inapplicable
>> insns, the relocation type can still appear for other reasons. Be more
>> thorough in the opcode checking we do, to avoid bogusly altering other
>> insns.
>>
>> Furthermore correct an opcode mask (even if with the added condition
>> that's now fully benign).
>>
>> --- a/bfd/elf32-i386.c
>> +++ b/bfd/elf32-i386.c
>> @@ -1453,15 +1453,16 @@ elf_i386_convert_load_reloc (bfd *abfd,
>>                   modrm = 0xc0 | (modrm & 0x38) >> 3;
>>                   opcode = 0xf7;
>>                 }
>> -             else
>> +             else if ((opcode | 0x38) == 0x3b)
>>                 {
>>                   /* Convert "binop foo@GOT(%reg1), %reg2" to
>>                      "binop $foo, %reg2".  */
>> -                 modrm = (0xc0
>> -                          | (modrm & 0x38) >> 3
>> -                          | (opcode & 0x3c));
>> +                 modrm = 0xc0 | ((modrm & 0x38) >> 3) | (opcode & 0x38);
>>                   opcode = 0x81;
>>                 }
>> +             else
>> +               return true;
>> +
>>               bfd_put_8 (abfd, modrm, contents + roff - 1);
>>               r_type = R_386_32;
>>             }
>>
> 
> Please add a testcase to show it makes a difference.

The 64-bit counterpart has got a testcase demonstrating that it does.
Counter request: Please could you have added an exhaustive testcase
back at the time, demonstrating that no undue adjustments could ever
be done?

Underlying point: Experience tells me that adding (sensible) testcases
usually takes much more time than making the actual code change. Hence
why in a case like this I opted for not adding (another) one.

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/5] ix86: tighten convert-load-reloc checking
  2025-02-04  7:21     ` Jan Beulich
@ 2025-02-04  7:26       ` H.J. Lu
  2025-02-04  7:58         ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: H.J. Lu @ 2025-02-04  7:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Feb 4, 2025 at 3:21 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 03.02.2025 23:34, H.J. Lu wrote:
> > On Mon, Feb 3, 2025 at 7:40 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> Just like was done recently for x86-64 (commit 4998f9ea9d35): Even if
> >> the assembler avoids using the relaxable relocation for inapplicable
> >> insns, the relocation type can still appear for other reasons. Be more
> >> thorough in the opcode checking we do, to avoid bogusly altering other
> >> insns.
> >>
> >> Furthermore correct an opcode mask (even if with the added condition
> >> that's now fully benign).
> >>
> >> --- a/bfd/elf32-i386.c
> >> +++ b/bfd/elf32-i386.c
> >> @@ -1453,15 +1453,16 @@ elf_i386_convert_load_reloc (bfd *abfd,
> >>                   modrm = 0xc0 | (modrm & 0x38) >> 3;
> >>                   opcode = 0xf7;
> >>                 }
> >> -             else
> >> +             else if ((opcode | 0x38) == 0x3b)
> >>                 {
> >>                   /* Convert "binop foo@GOT(%reg1), %reg2" to
> >>                      "binop $foo, %reg2".  */
> >> -                 modrm = (0xc0
> >> -                          | (modrm & 0x38) >> 3
> >> -                          | (opcode & 0x3c));
> >> +                 modrm = 0xc0 | ((modrm & 0x38) >> 3) | (opcode & 0x38);
> >>                   opcode = 0x81;
> >>                 }
> >> +             else
> >> +               return true;
> >> +
> >>               bfd_put_8 (abfd, modrm, contents + roff - 1);
> >>               r_type = R_386_32;
> >>             }
> >>
> >
> > Please add a testcase to show it makes a difference.
>
> The 64-bit counterpart has got a testcase demonstrating that it does.
> Counter request: Please could you have added an exhaustive testcase
> back at the time, demonstrating that no undue adjustments could ever
> be done?

If you find issues, please show them with tests.

>
> Underlying point: Experience tells me that adding (sensible) testcases
> usually takes much more time than making the actual code change. Hence
> why in a case like this I opted for not adding (another) one.
>

Then please postpone your change until you find time to write
some tests.


-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/5] ix86: tighten convert-load-reloc checking
  2025-02-04  7:26       ` H.J. Lu
@ 2025-02-04  7:58         ` Jan Beulich
  2025-02-04  8:03           ` H.J. Lu
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2025-02-04  7:58 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 04.02.2025 08:26, H.J. Lu wrote:
> On Tue, Feb 4, 2025 at 3:21 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 03.02.2025 23:34, H.J. Lu wrote:
>>> On Mon, Feb 3, 2025 at 7:40 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> Just like was done recently for x86-64 (commit 4998f9ea9d35): Even if
>>>> the assembler avoids using the relaxable relocation for inapplicable
>>>> insns, the relocation type can still appear for other reasons. Be more
>>>> thorough in the opcode checking we do, to avoid bogusly altering other
>>>> insns.
>>>>
>>>> Furthermore correct an opcode mask (even if with the added condition
>>>> that's now fully benign).
>>>>
>>>> --- a/bfd/elf32-i386.c
>>>> +++ b/bfd/elf32-i386.c
>>>> @@ -1453,15 +1453,16 @@ elf_i386_convert_load_reloc (bfd *abfd,
>>>>                   modrm = 0xc0 | (modrm & 0x38) >> 3;
>>>>                   opcode = 0xf7;
>>>>                 }
>>>> -             else
>>>> +             else if ((opcode | 0x38) == 0x3b)
>>>>                 {
>>>>                   /* Convert "binop foo@GOT(%reg1), %reg2" to
>>>>                      "binop $foo, %reg2".  */
>>>> -                 modrm = (0xc0
>>>> -                          | (modrm & 0x38) >> 3
>>>> -                          | (opcode & 0x3c));
>>>> +                 modrm = 0xc0 | ((modrm & 0x38) >> 3) | (opcode & 0x38);
>>>>                   opcode = 0x81;
>>>>                 }
>>>> +             else
>>>> +               return true;
>>>> +
>>>>               bfd_put_8 (abfd, modrm, contents + roff - 1);
>>>>               r_type = R_386_32;
>>>>             }
>>>>
>>>
>>> Please add a testcase to show it makes a difference.
>>
>> The 64-bit counterpart has got a testcase demonstrating that it does.
>> Counter request: Please could you have added an exhaustive testcase
>> back at the time, demonstrating that no undue adjustments could ever
>> be done?
> 
> If you find issues, please show them with tests.

And I did, in the 64-bit counterpart. The code here is exactly the same
as the 64-bit one (up to and including the wrong [but benign] use of
0x3c as a mask).

>> Underlying point: Experience tells me that adding (sensible) testcases
>> usually takes much more time than making the actual code change. Hence
>> why in a case like this I opted for not adding (another) one.
> 
> Then please postpone your change until you find time to write
> some tests.

Ehem. Are you really asking to hold back bugfixes?

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/5] ix86: tighten convert-load-reloc checking
  2025-02-04  7:58         ` Jan Beulich
@ 2025-02-04  8:03           ` H.J. Lu
  2025-02-04  8:10             ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: H.J. Lu @ 2025-02-04  8:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Feb 4, 2025 at 3:58 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 04.02.2025 08:26, H.J. Lu wrote:
> > On Tue, Feb 4, 2025 at 3:21 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 03.02.2025 23:34, H.J. Lu wrote:
> >>> On Mon, Feb 3, 2025 at 7:40 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> Just like was done recently for x86-64 (commit 4998f9ea9d35): Even if
> >>>> the assembler avoids using the relaxable relocation for inapplicable
> >>>> insns, the relocation type can still appear for other reasons. Be more
> >>>> thorough in the opcode checking we do, to avoid bogusly altering other
> >>>> insns.
> >>>>
> >>>> Furthermore correct an opcode mask (even if with the added condition
> >>>> that's now fully benign).
> >>>>
> >>>> --- a/bfd/elf32-i386.c
> >>>> +++ b/bfd/elf32-i386.c
> >>>> @@ -1453,15 +1453,16 @@ elf_i386_convert_load_reloc (bfd *abfd,
> >>>>                   modrm = 0xc0 | (modrm & 0x38) >> 3;
> >>>>                   opcode = 0xf7;
> >>>>                 }
> >>>> -             else
> >>>> +             else if ((opcode | 0x38) == 0x3b)
> >>>>                 {
> >>>>                   /* Convert "binop foo@GOT(%reg1), %reg2" to
> >>>>                      "binop $foo, %reg2".  */
> >>>> -                 modrm = (0xc0
> >>>> -                          | (modrm & 0x38) >> 3
> >>>> -                          | (opcode & 0x3c));
> >>>> +                 modrm = 0xc0 | ((modrm & 0x38) >> 3) | (opcode & 0x38);
> >>>>                   opcode = 0x81;
> >>>>                 }
> >>>> +             else
> >>>> +               return true;
> >>>> +
> >>>>               bfd_put_8 (abfd, modrm, contents + roff - 1);
> >>>>               r_type = R_386_32;
> >>>>             }
> >>>>
> >>>
> >>> Please add a testcase to show it makes a difference.
> >>
> >> The 64-bit counterpart has got a testcase demonstrating that it does.
> >> Counter request: Please could you have added an exhaustive testcase
> >> back at the time, demonstrating that no undue adjustments could ever
> >> be done?
> >
> > If you find issues, please show them with tests.
>
> And I did, in the 64-bit counterpart. The code here is exactly the same
> as the 64-bit one (up to and including the wrong [but benign] use of
> 0x3c as a mask).

I also requested tests for the 64-bit change.

>
> >> Underlying point: Experience tells me that adding (sensible) testcases
> >> usually takes much more time than making the actual code change. Hence
> >> why in a case like this I opted for not adding (another) one.
> >
> > Then please postpone your change until you find time to write
> > some tests.
>
> Ehem. Are you really asking to hold back bugfixes?
>

No test, no bug, no change.

-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/5] ix86: tighten convert-load-reloc checking
  2025-02-04  8:03           ` H.J. Lu
@ 2025-02-04  8:10             ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2025-02-04  8:10 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 04.02.2025 09:03, H.J. Lu wrote:
> On Tue, Feb 4, 2025 at 3:58 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 04.02.2025 08:26, H.J. Lu wrote:
>>> On Tue, Feb 4, 2025 at 3:21 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 03.02.2025 23:34, H.J. Lu wrote:
>>>>> On Mon, Feb 3, 2025 at 7:40 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> Just like was done recently for x86-64 (commit 4998f9ea9d35): Even if
>>>>>> the assembler avoids using the relaxable relocation for inapplicable
>>>>>> insns, the relocation type can still appear for other reasons. Be more
>>>>>> thorough in the opcode checking we do, to avoid bogusly altering other
>>>>>> insns.
>>>>>>
>>>>>> Furthermore correct an opcode mask (even if with the added condition
>>>>>> that's now fully benign).
>>>>>>
>>>>>> --- a/bfd/elf32-i386.c
>>>>>> +++ b/bfd/elf32-i386.c
>>>>>> @@ -1453,15 +1453,16 @@ elf_i386_convert_load_reloc (bfd *abfd,
>>>>>>                   modrm = 0xc0 | (modrm & 0x38) >> 3;
>>>>>>                   opcode = 0xf7;
>>>>>>                 }
>>>>>> -             else
>>>>>> +             else if ((opcode | 0x38) == 0x3b)
>>>>>>                 {
>>>>>>                   /* Convert "binop foo@GOT(%reg1), %reg2" to
>>>>>>                      "binop $foo, %reg2".  */
>>>>>> -                 modrm = (0xc0
>>>>>> -                          | (modrm & 0x38) >> 3
>>>>>> -                          | (opcode & 0x3c));
>>>>>> +                 modrm = 0xc0 | ((modrm & 0x38) >> 3) | (opcode & 0x38);
>>>>>>                   opcode = 0x81;
>>>>>>                 }
>>>>>> +             else
>>>>>> +               return true;
>>>>>> +
>>>>>>               bfd_put_8 (abfd, modrm, contents + roff - 1);
>>>>>>               r_type = R_386_32;
>>>>>>             }
>>>>>>
>>>>>
>>>>> Please add a testcase to show it makes a difference.
>>>>
>>>> The 64-bit counterpart has got a testcase demonstrating that it does.
>>>> Counter request: Please could you have added an exhaustive testcase
>>>> back at the time, demonstrating that no undue adjustments could ever
>>>> be done?
>>>
>>> If you find issues, please show them with tests.
>>
>> And I did, in the 64-bit counterpart. The code here is exactly the same
>> as the 64-bit one (up to and including the wrong [but benign] use of
>> 0x3c as a mask).
> 
> I also requested tests for the 64-bit change.

The 64-bit change went in already, with testcase (commit 4998f9ea9d35).

>>>> Underlying point: Experience tells me that adding (sensible) testcases
>>>> usually takes much more time than making the actual code change. Hence
>>>> why in a case like this I opted for not adding (another) one.
>>>
>>> Then please postpone your change until you find time to write
>>> some tests.
>>
>> Ehem. Are you really asking to hold back bugfixes?
> 
> No test, no bug, no change.

I would hope you're kidding, but experience with you tells me you're likely
meaning this seriously.

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL
  2025-02-03 11:39 [PATCH 0/5] x86: further GOT{,PCREL} related adjustments Jan Beulich
  2025-02-03 11:40 ` [PATCH 1/5] x86: drop redundant i.operands checks from output_disp() Jan Beulich
  2025-02-03 11:40 ` [PATCH 2/5] ix86: tighten convert-load-reloc checking Jan Beulich
@ 2025-02-03 11:41 ` Jan Beulich
  2025-02-03 22:40   ` H.J. Lu
  2025-02-03 11:41 ` [PATCH 4/5] x86-64: further tighten convert-load-reloc checking Jan Beulich
  2025-02-03 11:42 ` [PATCH 5/5] ix86: restrict use of GOT32X relocs Jan Beulich
  4 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2025-02-03 11:41 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

With us doing the transformation to an immediate operand for MOV and
various ALU insns, there's little reason to then not support the same
conversion for the other two insns which have respective immediate
operand forms. Unfortunately for IMUL (due to the 0F opcode prefix)
there's no suitable relocation, so the pre-APX forms cannot be marked
for relaxation in the assembler.
---
TBD: Is REX2 really permitted with PUSH <imm>?

--- a/bfd/elf32-i386.c
+++ b/bfd/elf32-i386.c
@@ -1209,6 +1209,10 @@ elf_i386_tls_transition (struct bfd_link
    to
    test $foo, %reg1
    and convert
+   push foo@GOT[(%reg)]
+   to
+   push $foo
+   and convert
    binop foo@GOT[(%reg1)], %reg2
    to
    binop $foo, %reg2
@@ -1233,7 +1237,7 @@ elf_i386_convert_load_reloc (bfd *abfd,
   unsigned int addend;
   unsigned int nop;
   bfd_vma nop_offset;
-  bool is_pic;
+  bool is_pic, is_branch = false;
   bool to_reloc_32;
   bool abs_symbol;
   unsigned int r_type;
@@ -1301,6 +1305,23 @@ elf_i386_convert_load_reloc (bfd *abfd,
 
   opcode = bfd_get_8 (abfd, contents + roff - 2);
 
+  if (opcode == 0xff)
+    {
+      switch (modrm & 0x38)
+	{
+	case 0x10: /* CALL */
+	case 0x20: /* JMP */
+	  is_branch = true;
+	  break;
+
+	case 0x30: /* PUSH */
+	  break;
+
+	default:
+	  return true;
+	}
+    }
+
   /* Convert to R_386_32 if PIC is false or there is no base
      register.  */
   to_reloc_32 = !is_pic || baseless;
@@ -1311,7 +1332,7 @@ elf_i386_convert_load_reloc (bfd *abfd,
      reloc.  */
   if (h == NULL)
     {
-      if (opcode == 0x0ff)
+      if (is_branch)
 	/* Convert "call/jmp *foo@GOT[(%reg)]".  */
 	goto convert_branch;
       else
@@ -1327,7 +1348,7 @@ elf_i386_convert_load_reloc (bfd *abfd,
       && !eh->linker_def
       && local_ref)
     {
-      if (opcode == 0xff)
+      if (is_branch)
 	{
 	  /* No direct branch to 0 for PIC.  */
 	  if (is_pic)
@@ -1343,7 +1364,7 @@ elf_i386_convert_load_reloc (bfd *abfd,
 	}
     }
 
-  if (opcode == 0xff)
+  if (is_branch)
     {
       /* We have "call/jmp *foo@GOT[(%reg)]".  */
       if ((h->root.type == bfd_link_hash_defined
@@ -1399,7 +1420,8 @@ elf_i386_convert_load_reloc (bfd *abfd,
   else
     {
       /* We have "mov foo@GOT[(%re1g)], %reg2",
-	 "test %reg1, foo@GOT(%reg2)" and
+	 "test %reg1, foo@GOT(%reg2)",
+	 "push foo@GOT[(%reg)]", or
 	 "binop foo@GOT[(%reg1)], %reg2".
 
 	 Avoid optimizing _DYNAMIC since ld.so may use its
@@ -1460,6 +1482,13 @@ elf_i386_convert_load_reloc (bfd *abfd,
 		  modrm = 0xc0 | ((modrm & 0x38) >> 3) | (opcode & 0x38);
 		  opcode = 0x81;
 		}
+	      else if (opcode == 0xff)
+		{
+		  /* Convert "push foo@GOT(%reg)" to
+		     "push $foo".  */
+		  modrm = 0x68; /* Really the opcode.  */
+		  opcode = 0x26; /* Really a meaningless %es: prefix.  */
+		}
 	      else
 		return true;
 
--- a/bfd/elf64-x86-64.c
+++ b/bfd/elf64-x86-64.c
@@ -1739,13 +1739,16 @@ elf_x86_64_need_pic (struct bfd_link_inf
 }
 
 /* Move the R bits to the B bits in EVEX payload byte 1.  */
-static unsigned int evex_move_r_to_b (unsigned int byte1)
+static unsigned int evex_move_r_to_b (unsigned int byte1, bool copy)
 {
   byte1 = (byte1 & ~(1 << 5)) | ((byte1 & (1 << 7)) >> 2); /* R3 -> B3 */
   byte1 = (byte1 & ~(1 << 3)) | ((~byte1 & (1 << 4)) >> 1); /* R4 -> B4 */
 
   /* Set both R bits, as they're inverted.  */
-  return byte1 | (1 << 4) | (1 << 7);
+  if (!copy)
+    byte1 |= (1 << 4) | (1 << 7);
+
+  return byte1;
 }
 
 /* With the local symbol, foo, we convert
@@ -1762,10 +1765,14 @@ static unsigned int evex_move_r_to_b (un
    to
    test $foo, %reg
    and convert
+   push foo@GOTPCREL(%rip)
+   to
+   push $foo
+   and convert
    binop foo@GOTPCREL(%rip), %reg
    to
    binop $foo, %reg
-   where binop is one of adc, add, and, cmp, or, sbb, sub, xor
+   where binop is one of adc, add, and, cmp, imul, or, sbb, sub, xor
    instructions.  */
 
 static bool
@@ -1781,6 +1788,7 @@ elf_x86_64_convert_load_reloc (bfd *abfd
   bool is_pic;
   bool no_overflow;
   bool relocx;
+  bool is_branch = false;
   bool to_reloc_pc32;
   bool abs_symbol;
   bool local_ref;
@@ -1873,6 +1881,23 @@ elf_x86_64_convert_load_reloc (bfd *abfd
   r_symndx = htab->r_sym (irel->r_info);
 
   opcode = bfd_get_8 (abfd, contents + roff - 2);
+  modrm = bfd_get_8 (abfd, contents + roff - 1);
+  if (opcode == 0xff)
+    {
+      switch (modrm & 0x38)
+	{
+	case 0x10: /* CALL */
+	case 0x20: /* JMP */
+	  is_branch = true;
+	  break;
+
+	case 0x30: /* PUSH */
+	  break;
+
+	default:
+	  return true;
+	}
+    }
 
   /* Convert mov to lea since it has been done for a while.  */
   if (opcode != 0x8b)
@@ -1890,7 +1915,7 @@ elf_x86_64_convert_load_reloc (bfd *abfd
      3. no_overflow is true.
      4. PIC.
      */
-  to_reloc_pc32 = (opcode == 0xff
+  to_reloc_pc32 = (is_branch
 		   || !relocx
 		   || no_overflow
 		   || is_pic);
@@ -1942,7 +1967,7 @@ elf_x86_64_convert_load_reloc (bfd *abfd
 	      && !eh->linker_def
 	      && local_ref))
 	{
-	  if (opcode == 0xff)
+	  if (is_branch)
 	    {
 	      /* Skip for branch instructions since R_X86_64_PC32
 		 may overflow.  */
@@ -2009,7 +2034,7 @@ elf_x86_64_convert_load_reloc (bfd *abfd
     return true;
 
  convert:
-  if (opcode == 0xff)
+  if (is_branch)
     {
       /* We have "call/jmp *foo@GOTPCREL(%rip)".  */
       unsigned int nop;
@@ -2018,7 +2043,6 @@ elf_x86_64_convert_load_reloc (bfd *abfd
 
       /* Convert R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX to
 	 R_X86_64_PC32.  */
-      modrm = bfd_get_8 (abfd, contents + roff - 1);
       if (modrm == 0x25)
 	{
 	  /* Convert to "jmp foo nop".  */
@@ -2064,11 +2088,12 @@ elf_x86_64_convert_load_reloc (bfd *abfd
     }
   else if (r_type == R_X86_64_CODE_6_GOTPCRELX && opcode != 0x8b)
     {
+      bool move_v_r = false;
+
       /* R_X86_64_PC32 isn't supported.  */
       if (to_reloc_pc32)
 	return true;
 
-      modrm = bfd_get_8 (abfd, contents + roff - 1);
       if (opcode == 0x85)
 	{
 	  /* Convert "ctest<cc> %reg, foo@GOTPCREL(%rip)" to
@@ -2094,6 +2119,23 @@ elf_x86_64_convert_load_reloc (bfd *abfd
 	  modrm = 0xc0 | ((modrm & 0x38) >> 3) | (opcode & 0x38);
 	  opcode = 0x81;
 	}
+      else if (opcode == 0xaf)
+	{
+	  if (!(evex[2] & 0x10))
+	    {
+	      /* Convert "imul foo@GOTPCREL(%rip), %reg" to
+	         "imul $foo, %reg, %reg".  */
+	      modrm = 0xc0 | ((modrm & 0x38) >> 3) | (modrm & 0x38);
+	    }
+	  else
+	    {
+	      /* Convert "imul foo@GOTPCREL(%rip), %reg1, %reg2" to
+	         "imul $foo, %reg1, %reg2".  */
+	      modrm = 0xc0 | ((modrm & 0x38) >> 3) | (~evex[1] & 0x38);
+	      move_v_r = true;
+	    }
+	  opcode = 0x69;
+	}
       else
 	return true;
 
@@ -2119,7 +2161,23 @@ elf_x86_64_convert_load_reloc (bfd *abfd
       bfd_put_8 (abfd, opcode, contents + roff - 2);
       bfd_put_8 (abfd, modrm, contents + roff - 1);
 
-      evex[0] = evex_move_r_to_b (evex[0]);
+      evex[0] = evex_move_r_to_b (evex[0], opcode == 0x69 && !move_v_r);
+      if (move_v_r)
+	{
+	  /* Move the top two V bits to the R bits in EVEX payload byte 1.
+	     Note that evex_move_r_to_b() set both R bits.  */
+	  if (!(evex[1] & (1 << 6)))
+	    evex[0] &= ~(1 << 7); /* V3 -> R3 */
+	  if (!(evex[2] & (1 << 3)))
+	    evex[0] &= ~(1 << 4); /* V4 -> R4 */
+	  /* Set all V bits, as they're inverted.  */
+	  evex[1] |= 0xf << 3;
+	  evex[2] |= 1 << 3;
+	  /* Clear the ND (ZU) bit (it ought to be ignored anyway).  */
+	  evex[2] &= ~(1 << 4);
+	  bfd_put_8 (abfd, evex[2], contents + roff - 3);
+	  bfd_put_8 (abfd, evex[1], contents + roff - 4);
+	}
       bfd_put_8 (abfd, evex[0], contents + roff - 5);
 
       /* No addend for R_X86_64_32/R_X86_64_32S relocations.  */
@@ -2162,7 +2220,10 @@ elf_x86_64_convert_load_reloc (bfd *abfd
 	{
 	  if (bfd_get_8 (abfd, contents + roff - 4) == 0xd5)
 	    {
-	      rex2 = bfd_get_8 (abfd, contents + roff - 3);
+	      /* Make sure even an all-zero payload leaves a non-zero value
+		 in the variable.  */
+	      rex2 = bfd_get_8 (abfd, contents + roff - 3) | 0x100;
+	      rex2_mask |= 0x100;
 	      rex_w = (rex2 & REX_W) != 0;
 	    }
 	  else if (bfd_get_8 (abfd, contents + roff - 4) == 0x0f)
@@ -2195,7 +2256,6 @@ elf_x86_64_convert_load_reloc (bfd *abfd
 	      /* Convert "mov foo@GOTPCREL(%rip), %reg" to
 		 "mov $foo, %reg".  */
 	      opcode = 0xc7;
-	      modrm = bfd_get_8 (abfd, contents + roff - 1);
 	      modrm = 0xc0 | (modrm & 0x38) >> 3;
 	      if (rex_w && ABI_64_P (link_info->output_bfd))
 		{
@@ -2222,7 +2282,6 @@ elf_x86_64_convert_load_reloc (bfd *abfd
 	  if (to_reloc_pc32)
 	    return true;
 
-	  modrm = bfd_get_8 (abfd, contents + roff - 1);
 	  if (opcode == 0x85)
 	    {
 	      /* Convert "test %reg, foo@GOTPCREL(%rip)" to
@@ -2237,6 +2296,39 @@ elf_x86_64_convert_load_reloc (bfd *abfd
 	      modrm = 0xc0 | ((modrm & 0x38) >> 3) | (opcode & 0x38);
 	      opcode = 0x81;
 	    }
+	  else if (opcode == 0xaf && (rex2 & (REX2_M << 4)))
+	    {
+	      /* Convert "imul foo@GOTPCREL(%rip), %reg" to
+		 "imul $foo, %reg, %reg".  */
+	      modrm = 0xc0 | ((modrm & 0x38) >> 3) | (modrm & 0x38);
+	      rex_mask = 0;
+	      rex2_mask = REX2_M << 4;
+	      opcode = 0x69;
+	    }
+	  else if (opcode == 0xff && !(rex2 & (REX2_M << 4)))
+	    {
+	      /* Convert "push foo@GOTPCREL(%rip)" to
+		 "push $foo".  */
+	      bfd_put_8 (abfd, 0x68, contents + roff - 1);
+	      if (rex)
+		{
+		  bfd_put_8 (abfd, 0x26, contents + roff - 3);
+		  bfd_put_8 (abfd, rex, contents + roff - 2);
+		}
+	      else if (rex2)
+		{
+		  bfd_put_8 (abfd, 0x26, contents + roff - 4);
+		  bfd_put_8 (abfd, 0xd5, contents + roff - 3);
+		  bfd_put_8 (abfd, rex2, contents + roff - 2);
+		}
+	      else
+		bfd_put_8 (abfd, 0x26, contents + roff - 2);
+
+	      r_type = R_X86_64_32S;
+	      /* No addend for R_X86_64_32S relocations.  */
+	      irel->r_addend = 0;
+	      goto finish;
+	    }
 	  else
 	    return true;
 
@@ -2297,6 +2389,7 @@ elf_x86_64_convert_load_reloc (bfd *abfd
 	}
     }
 
+ finish:
   *r_type_p = r_type;
   irel->r_info = htab->r_info (r_symndx,
 			       r_type | R_X86_64_converted_reloc_bit);
@@ -4378,7 +4471,7 @@ elf_x86_64_relocate_section (bfd *output
 		      continue;
 		    }
 
-		  byte1 = evex_move_r_to_b (byte1);
+		  byte1 = evex_move_r_to_b (byte1, false);
 		  bfd_put_8 (output_bfd, byte1, contents + roff - 5);
 		  bfd_put_8 (output_bfd, 0x81, contents + roff - 2);
 		  bfd_put_8 (output_bfd, 0xc0 | reg, contents + roff - 1);
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -12928,9 +12928,9 @@ output_disp (fragS *insn_start_frag, off
 	      else if (object_64bit)
 		continue;
 
-	      /* Check for "call/jmp *mem", "mov mem, %reg", "movrs mem, %reg",
-		 "test %reg, mem" and "binop mem, %reg" where binop
-		 is one of adc, add, and, cmp, or, sbb, sub, xor
+	      /* Check for "call/jmp *mem", "push mem", "mov mem, %reg",
+		 "movrs mem, %reg", "test %reg, mem" and "binop mem, %reg" where
+		 binop is one of adc, add, and, cmp, or, sbb, sub, xor, or imul
 		 instructions without data prefix.  Always generate
 		 R_386_GOT32X for "sym*GOT" operand in 32-bit mode.  */
 	      unsigned int space = dot_insn () ? i.insn_opcode_space
@@ -12940,7 +12940,7 @@ output_disp (fragS *insn_start_frag, off
 		      || (i.rm.mode == 0 && i.rm.regmem == 5))
 		  && ((space == SPACE_BASE
 		       && i.tm.base_opcode == 0xff
-		       && (i.rm.reg == 2 || i.rm.reg == 4))
+		       && (i.rm.reg == 2 || i.rm.reg == 4 || i.rm.reg == 6))
 		      || ((space == SPACE_BASE
 			   || space == SPACE_0F38
 			   || space == SPACE_MAP4)
@@ -12949,7 +12949,13 @@ output_disp (fragS *insn_start_frag, off
 			   || space == SPACE_MAP4)
 			  && (i.tm.base_opcode == 0x85
 			      || (i.tm.base_opcode
-				  | (i.operands > 2 ? 0x3a : 0x38)) == 0x3b))))
+				  | (i.operands > 2 ? 0x3a : 0x38)) == 0x3b))
+		      || (((space == SPACE_0F
+			    /* Because of the 0F prefix, no suitable relocation
+			       exists for this unless it's REX2-encoded.  */
+			    && is_apx_rex2_encoding ())
+			   || space == SPACE_MAP4)
+			  && i.tm.base_opcode == 0xaf)))
 		{
 		  if (object_64bit)
 		    {
--- a/ld/testsuite/ld-i386/i386.exp
+++ b/ld/testsuite/ld-i386/i386.exp
@@ -373,6 +373,8 @@ run_dump_test "load5a"
 run_dump_test "load5b"
 run_dump_test "load6"
 run_dump_test "load7"
+run_dump_test "load8a"
+run_dump_test "load8b"
 run_dump_test "pr19175"
 run_dump_test "pr19615"
 run_dump_test "pr19636-1a"
--- /dev/null
+++ b/ld/testsuite/ld-i386/load8.s
@@ -0,0 +1,20 @@
+	.data
+	.type	bar, @object
+bar:
+	.byte	1
+	.size	bar, .-bar
+	.globl	foo
+	.type	foo, @object
+foo:
+	.byte	1
+	.size	foo, .-foo
+	.text
+	.globl	_start
+	.type	_start, @function
+_start:
+	push	bar@GOT(%ecx)
+	push	foo@GOT(%edx)
+	.ifndef PIC
+	push	foo@GOT
+	.endif
+	.size	_start, .-_start
--- /dev/null
+++ b/ld/testsuite/ld-i386/load8a.d
@@ -0,0 +1,14 @@
+#source: load8.s
+#as: --32 -mrelax-relocations=yes
+#ld: -melf_i386 -z noseparate-code
+#objdump: -dw
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+8048074 <_start>:
+[ 	]*[a-f0-9]+:	26 68 86 90 04 08    	es push \$0x8049086
+[ 	]*[a-f0-9]+:	26 68 87 90 04 08    	es push \$0x8049087
+[ 	]*[a-f0-9]+:	26 68 87 90 04 08    	es push \$0x8049087
+#pass
--- /dev/null
+++ b/ld/testsuite/ld-i386/load8b.d
@@ -0,0 +1,13 @@
+#source: load8.s
+#as: --32 -mshared -mrelax-relocations=yes --defsym PIC=1
+#ld: -melf_i386 -shared -z noseparate-code
+#objdump: -dw
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+[0-9a-f]+ <_start>:
+[ 	]*[0-9a-f]+:	ff b1 f8 ff ff ff    	push   -0x8\(%ecx\)
+[ 	]*[0-9a-f]+:	ff b2 fc ff ff ff    	push   -0x4\(%edx\)
+#pass
--- a/ld/testsuite/ld-x86-64/apx-load1.s
+++ b/ld/testsuite/ld-x86-64/apx-load1.s
@@ -118,5 +118,11 @@ _start:
 	sub	%rbp, bar@GOTPCREL(%rip), %r21
 	xor	%rsi, bar@GOTPCREL(%rip), %r22
 
+	imul	bar@GOTPCREL(%rip), %r17
+	{nf} imul bar@GOTPCREL(%rip), %r17
+	imul	bar@GOTPCREL(%rip), %r17, %rdx
+	imul	bar@GOTPCREL(%rip), %rcx, %r18
+	{rex2} pushq bar@GOTPCREL(%rip)
+
 	.size	_start, .-_start
 	.p2align 12, 0x90
--- a/ld/testsuite/ld-x86-64/apx-load1a.d
+++ b/ld/testsuite/ld-x86-64/apx-load1a.d
@@ -115,4 +115,9 @@ Disassembly of section .text:
  +[a-f0-9]+:	62 f4 dc 10 19 25 74 0c 20 00 	sbb    %rsp,0x200c74\(%rip\),%r20        # 602000 <.*>
  +[a-f0-9]+:	62 f4 d4 10 29 2d 6a 0c 20 00 	sub    %rbp,0x200c6a\(%rip\),%r21        # 602000 <.*>
  +[a-f0-9]+:	62 f4 cc 10 81 f6 20 20 60 00 	xor    \$0x602020,%rsi,%r22
+ +[a-f0-9]+:	d5 58 69 c9 20 20 60 00 	imul   \$0x602020,%r17,%r17
+ +[a-f0-9]+:	62 ec fc 0c 69 c9 20 20 60 00 	\{nf\} imul \$0x602020,%r17,%r17
+ +[a-f0-9]+:	62 fc fc 08 69 d1 20 20 60 00 	imul   \$0x602020,%r17,%rdx
+ +[a-f0-9]+:	62 e4 fc 08 69 d1 20 20 60 00 	imul   \$0x602020,%rcx,%r18
+ +[a-f0-9]+:	26 d5 00 68 20 20 60 00 	es \{rex2 0x0\} push \$0x602020
 #pass
--- a/ld/testsuite/ld-x86-64/apx-load1c.d
+++ b/ld/testsuite/ld-x86-64/apx-load1c.d
@@ -108,4 +108,9 @@ Disassembly of section .text:
  +[a-f0-9]+:	62 f4 dc 10 19 25 54 0d 20 00 	sbb    %rsp,0x200d54\(%rip\),%r20        # 2020e0 <.*>
  +[a-f0-9]+:	62 f4 d4 10 29 2d 4a 0d 20 00 	sub    %rbp,0x200d4a\(%rip\),%r21        # 2020e0 <.*>
  +[a-f0-9]+:	62 f4 cc 10 31 35 40 0d 20 00 	xor    %rsi,0x200d40\(%rip\),%r22        # 2020e0 <.*>
+ +[a-f0-9]+:	d5 c8 af 0d 38 0d 20 00 	imul   0x200d38\(%rip\),%r17        # 2020e0 <.*>
+ +[a-f0-9]+:	62 e4 fc 0c af 0d 2e 0d 20 00 	\{nf\} imul 0x200d2e\(%rip\),%r17        # 2020e0 <.*>
+ +[a-f0-9]+:	62 e4 ec 18 af 0d 24 0d 20 00 	imul   0x200d24\(%rip\),%r17,%rdx        # 2020e0 <.*>
+ +[a-f0-9]+:	62 f4 ec 10 af 0d 1a 0d 20 00 	imul   0x200d1a\(%rip\),%rcx,%r18        # 2020e0 <.*>
+ +[a-f0-9]+:	d5 00 ff 35 12 0d 20 00 	\{rex2 0x0\} push 0x200d12\(%rip\)        # 2020e0 <.*>
 #pass
--- a/ld/testsuite/ld-x86-64/apx-load1d.d
+++ b/ld/testsuite/ld-x86-64/apx-load1d.d
@@ -108,4 +108,9 @@ Disassembly of section .text:
  +[a-f0-9]+:	62 f4 dc 10 19 25 e4 0c 20 00 	sbb    %rsp,0x200ce4\(%rip\),%r20        # 202070 <.*>
  +[a-f0-9]+:	62 f4 d4 10 29 2d da 0c 20 00 	sub    %rbp,0x200cda\(%rip\),%r21        # 202070 <.*>
  +[a-f0-9]+:	62 f4 cc 10 31 35 d0 0c 20 00 	xor    %rsi,0x200cd0\(%rip\),%r22        # 202070 <.*>
+ +[a-f0-9]+:	d5 c8 af 0d c8 0c 20 00 	imul   0x200cc8\(%rip\),%r17        # 202070 <.*>
+ +[a-f0-9]+:	62 e4 fc 0c af 0d be 0c 20 00 	\{nf\} imul 0x200cbe\(%rip\),%r17        # 202070 <.*>
+ +[a-f0-9]+:	62 e4 ec 18 af 0d b4 0c 20 00 	imul   0x200cb4\(%rip\),%r17,%rdx        # 202070 <.*>
+ +[a-f0-9]+:	62 f4 ec 10 af 0d aa 0c 20 00 	imul   0x200caa\(%rip\),%rcx,%r18        # 202070 <.*>
+ +[a-f0-9]+:	d5 00 ff 35 a2 0c 20 00 	\{rex2 0x0\} push 0x200ca2\(%rip\)        # 202070 <.*>
 #pass
--- /dev/null
+++ b/ld/testsuite/ld-x86-64/load5.s
@@ -0,0 +1,17 @@
+	.data
+	.type	bar, @object
+bar:
+	.byte	1
+	.size	bar, .-bar
+	.globl	foo
+	.type	foo, @object
+foo:
+	.byte	1
+	.size	foo, .-foo
+	.text
+	.globl	_start
+	.type	_start, @function
+_start:
+	      pushq	bar@GOTPCREL(%rip)
+	{rex} pushq	foo@GOTPCREL(%rip)
+	.size	_start, .-_start
--- /dev/null
+++ b/ld/testsuite/ld-x86-64/load5a.d
@@ -0,0 +1,15 @@
+#source: load5.s
+#as: --64 -mrelax-relocations=yes
+#ld: -melf_x86_64
+#objdump: -dw
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+#...
+[a-f0-9]+ <_start>:
+[ 	]*[a-f0-9]+:	26 68 ([0-9a-f]{2} ){4} *	es push \$0x[a-f0-9]+
+[ 	]*[a-f0-9]+:	26 40 68 ([0-9a-f]{2} ){4} *	es rex push \$0x[a-f0-9]+
+#pass
--- /dev/null
+++ b/ld/testsuite/ld-x86-64/load5b.d
@@ -0,0 +1,15 @@
+#source: load5.s
+#as: --64 -mrelax-relocations=yes
+#ld: -shared -melf_x86_64
+#objdump: -dw
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+#...
+[a-f0-9]+ <_start>:
+[ 	]*[a-f0-9]+:	ff 35 ([0-9a-f]{2} ){4} *	push +0x[a-f0-9]+\(%rip\)        # [a-f0-9]+ <.*>
+[ 	]*[a-f0-9]+:	40 ff 35 ([0-9a-f]{2} ){4} *	rex push 0x[a-f0-9]+\(%rip\)        # [a-f0-9]+ <.*>
+#pass
--- a/ld/testsuite/ld-x86-64/x86-64.exp
+++ b/ld/testsuite/ld-x86-64/x86-64.exp
@@ -654,6 +654,8 @@ run_dump_test "load2"
 run_dump_test "load3a"
 run_dump_test "load3b"
 run_dump_test "load4"
+run_dump_test "load5a"
+run_dump_test "load5b"
 run_dump_test "call1a"
 run_dump_test "call1b"
 run_dump_test "call1c"


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL
  2025-02-03 11:41 ` [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL Jan Beulich
@ 2025-02-03 22:40   ` H.J. Lu
  2025-02-04 10:14     ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: H.J. Lu @ 2025-02-03 22:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Mon, Feb 3, 2025 at 7:41 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> With us doing the transformation to an immediate operand for MOV and
> various ALU insns, there's little reason to then not support the same
> conversion for the other two insns which have respective immediate
> operand forms. Unfortunately for IMUL (due to the 0F opcode prefix)
> there's no suitable relocation, so the pre-APX forms cannot be marked
> for relaxation in the assembler.
> ---
> TBD: Is REX2 really permitted with PUSH <imm>?
>
> --- a/bfd/elf32-i386.c
> +++ b/bfd/elf32-i386.c
> @@ -1209,6 +1209,10 @@ elf_i386_tls_transition (struct bfd_link
>     to
>     test $foo, %reg1
>     and convert
> +   push foo@GOT[(%reg)]
> +   to
> +   push $foo
> +   and convert
>     binop foo@GOT[(%reg1)], %reg2
>     to
>     binop $foo, %reg2
> @@ -1233,7 +1237,7 @@ elf_i386_convert_load_reloc (bfd *abfd,
>    unsigned int addend;
>    unsigned int nop;
>    bfd_vma nop_offset;
> -  bool is_pic;
> +  bool is_pic, is_branch = false;
>    bool to_reloc_32;
>    bool abs_symbol;
>    unsigned int r_type;
> @@ -1301,6 +1305,23 @@ elf_i386_convert_load_reloc (bfd *abfd,
>
>    opcode = bfd_get_8 (abfd, contents + roff - 2);
>
> +  if (opcode == 0xff)
> +    {
> +      switch (modrm & 0x38)
> +       {
> +       case 0x10: /* CALL */
> +       case 0x20: /* JMP */
> +         is_branch = true;
> +         break;
> +
> +       case 0x30: /* PUSH */
> +         break;
> +
> +       default:
> +         return true;
> +       }
> +    }
> +
>    /* Convert to R_386_32 if PIC is false or there is no base
>       register.  */
>    to_reloc_32 = !is_pic || baseless;
> @@ -1311,7 +1332,7 @@ elf_i386_convert_load_reloc (bfd *abfd,
>       reloc.  */
>    if (h == NULL)
>      {
> -      if (opcode == 0x0ff)
> +      if (is_branch)
>         /* Convert "call/jmp *foo@GOT[(%reg)]".  */
>         goto convert_branch;
>        else
> @@ -1327,7 +1348,7 @@ elf_i386_convert_load_reloc (bfd *abfd,
>        && !eh->linker_def
>        && local_ref)
>      {
> -      if (opcode == 0xff)
> +      if (is_branch)
>         {
>           /* No direct branch to 0 for PIC.  */
>           if (is_pic)
> @@ -1343,7 +1364,7 @@ elf_i386_convert_load_reloc (bfd *abfd,
>         }
>      }
>
> -  if (opcode == 0xff)
> +  if (is_branch)
>      {
>        /* We have "call/jmp *foo@GOT[(%reg)]".  */
>        if ((h->root.type == bfd_link_hash_defined
> @@ -1399,7 +1420,8 @@ elf_i386_convert_load_reloc (bfd *abfd,
>    else
>      {
>        /* We have "mov foo@GOT[(%re1g)], %reg2",
> -        "test %reg1, foo@GOT(%reg2)" and
> +        "test %reg1, foo@GOT(%reg2)",
> +        "push foo@GOT[(%reg)]", or
>          "binop foo@GOT[(%reg1)], %reg2".
>
>          Avoid optimizing _DYNAMIC since ld.so may use its
> @@ -1460,6 +1482,13 @@ elf_i386_convert_load_reloc (bfd *abfd,
>                   modrm = 0xc0 | ((modrm & 0x38) >> 3) | (opcode & 0x38);
>                   opcode = 0x81;
>                 }
> +             else if (opcode == 0xff)
> +               {
> +                 /* Convert "push foo@GOT(%reg)" to
> +                    "push $foo".  */
> +                 modrm = 0x68; /* Really the opcode.  */
> +                 opcode = 0x26; /* Really a meaningless %es: prefix.  */
> +               }
>               else
>                 return true;
>
> --- a/bfd/elf64-x86-64.c
> +++ b/bfd/elf64-x86-64.c
> @@ -1739,13 +1739,16 @@ elf_x86_64_need_pic (struct bfd_link_inf
>  }
>
>  /* Move the R bits to the B bits in EVEX payload byte 1.  */
> -static unsigned int evex_move_r_to_b (unsigned int byte1)
> +static unsigned int evex_move_r_to_b (unsigned int byte1, bool copy)
>  {
>    byte1 = (byte1 & ~(1 << 5)) | ((byte1 & (1 << 7)) >> 2); /* R3 -> B3 */
>    byte1 = (byte1 & ~(1 << 3)) | ((~byte1 & (1 << 4)) >> 1); /* R4 -> B4 */
>
>    /* Set both R bits, as they're inverted.  */
> -  return byte1 | (1 << 4) | (1 << 7);
> +  if (!copy)
> +    byte1 |= (1 << 4) | (1 << 7);
> +
> +  return byte1;
>  }
>
>  /* With the local symbol, foo, we convert
> @@ -1762,10 +1765,14 @@ static unsigned int evex_move_r_to_b (un
>     to
>     test $foo, %reg
>     and convert
> +   push foo@GOTPCREL(%rip)
> +   to
> +   push $foo
> +   and convert
>     binop foo@GOTPCREL(%rip), %reg
>     to
>     binop $foo, %reg
> -   where binop is one of adc, add, and, cmp, or, sbb, sub, xor
> +   where binop is one of adc, add, and, cmp, imul, or, sbb, sub, xor
>     instructions.  */
>
>  static bool
> @@ -1781,6 +1788,7 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>    bool is_pic;
>    bool no_overflow;
>    bool relocx;
> +  bool is_branch = false;
>    bool to_reloc_pc32;
>    bool abs_symbol;
>    bool local_ref;
> @@ -1873,6 +1881,23 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>    r_symndx = htab->r_sym (irel->r_info);
>
>    opcode = bfd_get_8 (abfd, contents + roff - 2);
> +  modrm = bfd_get_8 (abfd, contents + roff - 1);
> +  if (opcode == 0xff)
> +    {
> +      switch (modrm & 0x38)
> +       {
> +       case 0x10: /* CALL */
> +       case 0x20: /* JMP */
> +         is_branch = true;
> +         break;
> +
> +       case 0x30: /* PUSH */
> +         break;
> +
> +       default:
> +         return true;
> +       }
> +    }
>
>    /* Convert mov to lea since it has been done for a while.  */
>    if (opcode != 0x8b)
> @@ -1890,7 +1915,7 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>       3. no_overflow is true.
>       4. PIC.
>       */
> -  to_reloc_pc32 = (opcode == 0xff
> +  to_reloc_pc32 = (is_branch
>                    || !relocx
>                    || no_overflow
>                    || is_pic);
> @@ -1942,7 +1967,7 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>               && !eh->linker_def
>               && local_ref))
>         {
> -         if (opcode == 0xff)
> +         if (is_branch)
>             {
>               /* Skip for branch instructions since R_X86_64_PC32
>                  may overflow.  */
> @@ -2009,7 +2034,7 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>      return true;
>
>   convert:
> -  if (opcode == 0xff)
> +  if (is_branch)
>      {
>        /* We have "call/jmp *foo@GOTPCREL(%rip)".  */
>        unsigned int nop;
> @@ -2018,7 +2043,6 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>
>        /* Convert R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX to
>          R_X86_64_PC32.  */
> -      modrm = bfd_get_8 (abfd, contents + roff - 1);
>        if (modrm == 0x25)
>         {
>           /* Convert to "jmp foo nop".  */
> @@ -2064,11 +2088,12 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>      }
>    else if (r_type == R_X86_64_CODE_6_GOTPCRELX && opcode != 0x8b)
>      {
> +      bool move_v_r = false;
> +
>        /* R_X86_64_PC32 isn't supported.  */
>        if (to_reloc_pc32)
>         return true;
>
> -      modrm = bfd_get_8 (abfd, contents + roff - 1);
>        if (opcode == 0x85)
>         {
>           /* Convert "ctest<cc> %reg, foo@GOTPCREL(%rip)" to
> @@ -2094,6 +2119,23 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>           modrm = 0xc0 | ((modrm & 0x38) >> 3) | (opcode & 0x38);
>           opcode = 0x81;
>         }
> +      else if (opcode == 0xaf)
> +       {
> +         if (!(evex[2] & 0x10))
> +           {
> +             /* Convert "imul foo@GOTPCREL(%rip), %reg" to
> +                "imul $foo, %reg, %reg".  */
> +             modrm = 0xc0 | ((modrm & 0x38) >> 3) | (modrm & 0x38);
> +           }
> +         else
> +           {
> +             /* Convert "imul foo@GOTPCREL(%rip), %reg1, %reg2" to
> +                "imul $foo, %reg1, %reg2".  */
> +             modrm = 0xc0 | ((modrm & 0x38) >> 3) | (~evex[1] & 0x38);
> +             move_v_r = true;
> +           }
> +         opcode = 0x69;
> +       }
>        else
>         return true;
>
> @@ -2119,7 +2161,23 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>        bfd_put_8 (abfd, opcode, contents + roff - 2);
>        bfd_put_8 (abfd, modrm, contents + roff - 1);
>
> -      evex[0] = evex_move_r_to_b (evex[0]);
> +      evex[0] = evex_move_r_to_b (evex[0], opcode == 0x69 && !move_v_r);
> +      if (move_v_r)
> +       {
> +         /* Move the top two V bits to the R bits in EVEX payload byte 1.
> +            Note that evex_move_r_to_b() set both R bits.  */
> +         if (!(evex[1] & (1 << 6)))
> +           evex[0] &= ~(1 << 7); /* V3 -> R3 */
> +         if (!(evex[2] & (1 << 3)))
> +           evex[0] &= ~(1 << 4); /* V4 -> R4 */
> +         /* Set all V bits, as they're inverted.  */
> +         evex[1] |= 0xf << 3;
> +         evex[2] |= 1 << 3;
> +         /* Clear the ND (ZU) bit (it ought to be ignored anyway).  */
> +         evex[2] &= ~(1 << 4);
> +         bfd_put_8 (abfd, evex[2], contents + roff - 3);
> +         bfd_put_8 (abfd, evex[1], contents + roff - 4);
> +       }
>        bfd_put_8 (abfd, evex[0], contents + roff - 5);
>
>        /* No addend for R_X86_64_32/R_X86_64_32S relocations.  */
> @@ -2162,7 +2220,10 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>         {
>           if (bfd_get_8 (abfd, contents + roff - 4) == 0xd5)
>             {
> -             rex2 = bfd_get_8 (abfd, contents + roff - 3);
> +             /* Make sure even an all-zero payload leaves a non-zero value
> +                in the variable.  */
> +             rex2 = bfd_get_8 (abfd, contents + roff - 3) | 0x100;
> +             rex2_mask |= 0x100;
>               rex_w = (rex2 & REX_W) != 0;
>             }
>           else if (bfd_get_8 (abfd, contents + roff - 4) == 0x0f)
> @@ -2195,7 +2256,6 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>               /* Convert "mov foo@GOTPCREL(%rip), %reg" to
>                  "mov $foo, %reg".  */
>               opcode = 0xc7;
> -             modrm = bfd_get_8 (abfd, contents + roff - 1);
>               modrm = 0xc0 | (modrm & 0x38) >> 3;
>               if (rex_w && ABI_64_P (link_info->output_bfd))
>                 {
> @@ -2222,7 +2282,6 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>           if (to_reloc_pc32)
>             return true;
>
> -         modrm = bfd_get_8 (abfd, contents + roff - 1);
>           if (opcode == 0x85)
>             {
>               /* Convert "test %reg, foo@GOTPCREL(%rip)" to
> @@ -2237,6 +2296,39 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>               modrm = 0xc0 | ((modrm & 0x38) >> 3) | (opcode & 0x38);
>               opcode = 0x81;
>             }
> +         else if (opcode == 0xaf && (rex2 & (REX2_M << 4)))
> +           {
> +             /* Convert "imul foo@GOTPCREL(%rip), %reg" to
> +                "imul $foo, %reg, %reg".  */
> +             modrm = 0xc0 | ((modrm & 0x38) >> 3) | (modrm & 0x38);
> +             rex_mask = 0;
> +             rex2_mask = REX2_M << 4;
> +             opcode = 0x69;
> +           }
> +         else if (opcode == 0xff && !(rex2 & (REX2_M << 4)))
> +           {
> +             /* Convert "push foo@GOTPCREL(%rip)" to
> +                "push $foo".  */
> +             bfd_put_8 (abfd, 0x68, contents + roff - 1);
> +             if (rex)
> +               {
> +                 bfd_put_8 (abfd, 0x26, contents + roff - 3);
> +                 bfd_put_8 (abfd, rex, contents + roff - 2);
> +               }
> +             else if (rex2)
> +               {
> +                 bfd_put_8 (abfd, 0x26, contents + roff - 4);
> +                 bfd_put_8 (abfd, 0xd5, contents + roff - 3);
> +                 bfd_put_8 (abfd, rex2, contents + roff - 2);
> +               }
> +             else
> +               bfd_put_8 (abfd, 0x26, contents + roff - 2);
> +
> +             r_type = R_X86_64_32S;
> +             /* No addend for R_X86_64_32S relocations.  */
> +             irel->r_addend = 0;
> +             goto finish;
> +           }
>           else
>             return true;
>
> @@ -2297,6 +2389,7 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>         }
>      }
>
> + finish:
>    *r_type_p = r_type;
>    irel->r_info = htab->r_info (r_symndx,
>                                r_type | R_X86_64_converted_reloc_bit);
> @@ -4378,7 +4471,7 @@ elf_x86_64_relocate_section (bfd *output
>                       continue;
>                     }
>
> -                 byte1 = evex_move_r_to_b (byte1);
> +                 byte1 = evex_move_r_to_b (byte1, false);
>                   bfd_put_8 (output_bfd, byte1, contents + roff - 5);
>                   bfd_put_8 (output_bfd, 0x81, contents + roff - 2);
>                   bfd_put_8 (output_bfd, 0xc0 | reg, contents + roff - 1);
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -12928,9 +12928,9 @@ output_disp (fragS *insn_start_frag, off
>               else if (object_64bit)
>                 continue;
>
> -             /* Check for "call/jmp *mem", "mov mem, %reg", "movrs mem, %reg",
> -                "test %reg, mem" and "binop mem, %reg" where binop
> -                is one of adc, add, and, cmp, or, sbb, sub, xor
> +             /* Check for "call/jmp *mem", "push mem", "mov mem, %reg",
> +                "movrs mem, %reg", "test %reg, mem" and "binop mem, %reg" where
> +                binop is one of adc, add, and, cmp, or, sbb, sub, xor, or imul
>                  instructions without data prefix.  Always generate
>                  R_386_GOT32X for "sym*GOT" operand in 32-bit mode.  */
>               unsigned int space = dot_insn () ? i.insn_opcode_space
> @@ -12940,7 +12940,7 @@ output_disp (fragS *insn_start_frag, off
>                       || (i.rm.mode == 0 && i.rm.regmem == 5))
>                   && ((space == SPACE_BASE
>                        && i.tm.base_opcode == 0xff
> -                      && (i.rm.reg == 2 || i.rm.reg == 4))
> +                      && (i.rm.reg == 2 || i.rm.reg == 4 || i.rm.reg == 6))
>                       || ((space == SPACE_BASE
>                            || space == SPACE_0F38
>                            || space == SPACE_MAP4)
> @@ -12949,7 +12949,13 @@ output_disp (fragS *insn_start_frag, off
>                            || space == SPACE_MAP4)
>                           && (i.tm.base_opcode == 0x85
>                               || (i.tm.base_opcode
> -                                 | (i.operands > 2 ? 0x3a : 0x38)) == 0x3b))))
> +                                 | (i.operands > 2 ? 0x3a : 0x38)) == 0x3b))
> +                     || (((space == SPACE_0F
> +                           /* Because of the 0F prefix, no suitable relocation
> +                              exists for this unless it's REX2-encoded.  */
> +                           && is_apx_rex2_encoding ())
> +                          || space == SPACE_MAP4)
> +                         && i.tm.base_opcode == 0xaf)))
>                 {
>                   if (object_64bit)
>                     {
> --- a/ld/testsuite/ld-i386/i386.exp
> +++ b/ld/testsuite/ld-i386/i386.exp
> @@ -373,6 +373,8 @@ run_dump_test "load5a"
>  run_dump_test "load5b"
>  run_dump_test "load6"
>  run_dump_test "load7"
> +run_dump_test "load8a"
> +run_dump_test "load8b"
>  run_dump_test "pr19175"
>  run_dump_test "pr19615"
>  run_dump_test "pr19636-1a"
> --- /dev/null
> +++ b/ld/testsuite/ld-i386/load8.s
> @@ -0,0 +1,20 @@
> +       .data
> +       .type   bar, @object
> +bar:
> +       .byte   1
> +       .size   bar, .-bar
> +       .globl  foo
> +       .type   foo, @object
> +foo:
> +       .byte   1
> +       .size   foo, .-foo
> +       .text
> +       .globl  _start
> +       .type   _start, @function
> +_start:
> +       push    bar@GOT(%ecx)
> +       push    foo@GOT(%edx)
> +       .ifndef PIC
> +       push    foo@GOT
> +       .endif
> +       .size   _start, .-_start
> --- /dev/null
> +++ b/ld/testsuite/ld-i386/load8a.d
> @@ -0,0 +1,14 @@
> +#source: load8.s
> +#as: --32 -mrelax-relocations=yes
> +#ld: -melf_i386 -z noseparate-code
> +#objdump: -dw
> +
> +.*: +file format .*
> +
> +Disassembly of section .text:
> +
> +0+8048074 <_start>:
> +[      ]*[a-f0-9]+:    26 68 86 90 04 08       es push \$0x8049086
> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087

Please avoid adding the es prefix.  It may not be nop in the future.

> +#pass
> --- /dev/null
> +++ b/ld/testsuite/ld-i386/load8b.d
> @@ -0,0 +1,13 @@
> +#source: load8.s
> +#as: --32 -mshared -mrelax-relocations=yes --defsym PIC=1
> +#ld: -melf_i386 -shared -z noseparate-code
> +#objdump: -dw
> +
> +.*: +file format .*
> +
> +Disassembly of section .text:
> +
> +0+[0-9a-f]+ <_start>:
> +[      ]*[0-9a-f]+:    ff b1 f8 ff ff ff       push   -0x8\(%ecx\)
> +[      ]*[0-9a-f]+:    ff b2 fc ff ff ff       push   -0x4\(%edx\)
> +#pass
> --- a/ld/testsuite/ld-x86-64/apx-load1.s
> +++ b/ld/testsuite/ld-x86-64/apx-load1.s
> @@ -118,5 +118,11 @@ _start:
>         sub     %rbp, bar@GOTPCREL(%rip), %r21
>         xor     %rsi, bar@GOTPCREL(%rip), %r22
>
> +       imul    bar@GOTPCREL(%rip), %r17
> +       {nf} imul bar@GOTPCREL(%rip), %r17
> +       imul    bar@GOTPCREL(%rip), %r17, %rdx
> +       imul    bar@GOTPCREL(%rip), %rcx, %r18
> +       {rex2} pushq bar@GOTPCREL(%rip)
> +
>         .size   _start, .-_start
>         .p2align 12, 0x90
> --- a/ld/testsuite/ld-x86-64/apx-load1a.d
> +++ b/ld/testsuite/ld-x86-64/apx-load1a.d
> @@ -115,4 +115,9 @@ Disassembly of section .text:
>   +[a-f0-9]+:   62 f4 dc 10 19 25 74 0c 20 00   sbb    %rsp,0x200c74\(%rip\),%r20        # 602000 <.*>
>   +[a-f0-9]+:   62 f4 d4 10 29 2d 6a 0c 20 00   sub    %rbp,0x200c6a\(%rip\),%r21        # 602000 <.*>
>   +[a-f0-9]+:   62 f4 cc 10 81 f6 20 20 60 00   xor    \$0x602020,%rsi,%r22
> + +[a-f0-9]+:   d5 58 69 c9 20 20 60 00         imul   \$0x602020,%r17,%r17
> + +[a-f0-9]+:   62 ec fc 0c 69 c9 20 20 60 00   \{nf\} imul \$0x602020,%r17,%r17
> + +[a-f0-9]+:   62 fc fc 08 69 d1 20 20 60 00   imul   \$0x602020,%r17,%rdx
> + +[a-f0-9]+:   62 e4 fc 08 69 d1 20 20 60 00   imul   \$0x602020,%rcx,%r18
> + +[a-f0-9]+:   26 d5 00 68 20 20 60 00         es \{rex2 0x0\} push \$0x602020
>  #pass
> --- a/ld/testsuite/ld-x86-64/apx-load1c.d
> +++ b/ld/testsuite/ld-x86-64/apx-load1c.d
> @@ -108,4 +108,9 @@ Disassembly of section .text:
>   +[a-f0-9]+:   62 f4 dc 10 19 25 54 0d 20 00   sbb    %rsp,0x200d54\(%rip\),%r20        # 2020e0 <.*>
>   +[a-f0-9]+:   62 f4 d4 10 29 2d 4a 0d 20 00   sub    %rbp,0x200d4a\(%rip\),%r21        # 2020e0 <.*>
>   +[a-f0-9]+:   62 f4 cc 10 31 35 40 0d 20 00   xor    %rsi,0x200d40\(%rip\),%r22        # 2020e0 <.*>
> + +[a-f0-9]+:   d5 c8 af 0d 38 0d 20 00         imul   0x200d38\(%rip\),%r17        # 2020e0 <.*>
> + +[a-f0-9]+:   62 e4 fc 0c af 0d 2e 0d 20 00   \{nf\} imul 0x200d2e\(%rip\),%r17        # 2020e0 <.*>
> + +[a-f0-9]+:   62 e4 ec 18 af 0d 24 0d 20 00   imul   0x200d24\(%rip\),%r17,%rdx        # 2020e0 <.*>
> + +[a-f0-9]+:   62 f4 ec 10 af 0d 1a 0d 20 00   imul   0x200d1a\(%rip\),%rcx,%r18        # 2020e0 <.*>
> + +[a-f0-9]+:   d5 00 ff 35 12 0d 20 00         \{rex2 0x0\} push 0x200d12\(%rip\)        # 2020e0 <.*>
>  #pass
> --- a/ld/testsuite/ld-x86-64/apx-load1d.d
> +++ b/ld/testsuite/ld-x86-64/apx-load1d.d
> @@ -108,4 +108,9 @@ Disassembly of section .text:
>   +[a-f0-9]+:   62 f4 dc 10 19 25 e4 0c 20 00   sbb    %rsp,0x200ce4\(%rip\),%r20        # 202070 <.*>
>   +[a-f0-9]+:   62 f4 d4 10 29 2d da 0c 20 00   sub    %rbp,0x200cda\(%rip\),%r21        # 202070 <.*>
>   +[a-f0-9]+:   62 f4 cc 10 31 35 d0 0c 20 00   xor    %rsi,0x200cd0\(%rip\),%r22        # 202070 <.*>
> + +[a-f0-9]+:   d5 c8 af 0d c8 0c 20 00         imul   0x200cc8\(%rip\),%r17        # 202070 <.*>
> + +[a-f0-9]+:   62 e4 fc 0c af 0d be 0c 20 00   \{nf\} imul 0x200cbe\(%rip\),%r17        # 202070 <.*>
> + +[a-f0-9]+:   62 e4 ec 18 af 0d b4 0c 20 00   imul   0x200cb4\(%rip\),%r17,%rdx        # 202070 <.*>
> + +[a-f0-9]+:   62 f4 ec 10 af 0d aa 0c 20 00   imul   0x200caa\(%rip\),%rcx,%r18        # 202070 <.*>
> + +[a-f0-9]+:   d5 00 ff 35 a2 0c 20 00         \{rex2 0x0\} push 0x200ca2\(%rip\)        # 202070 <.*>
>  #pass
> --- /dev/null
> +++ b/ld/testsuite/ld-x86-64/load5.s
> @@ -0,0 +1,17 @@
> +       .data
> +       .type   bar, @object
> +bar:
> +       .byte   1
> +       .size   bar, .-bar
> +       .globl  foo
> +       .type   foo, @object
> +foo:
> +       .byte   1
> +       .size   foo, .-foo
> +       .text
> +       .globl  _start
> +       .type   _start, @function
> +_start:
> +             pushq     bar@GOTPCREL(%rip)
> +       {rex} pushq     foo@GOTPCREL(%rip)
> +       .size   _start, .-_start
> --- /dev/null
> +++ b/ld/testsuite/ld-x86-64/load5a.d
> @@ -0,0 +1,15 @@
> +#source: load5.s
> +#as: --64 -mrelax-relocations=yes
> +#ld: -melf_x86_64
> +#objdump: -dw
> +
> +.*: +file format .*
> +
> +
> +Disassembly of section .text:
> +
> +#...
> +[a-f0-9]+ <_start>:
> +[      ]*[a-f0-9]+:    26 68 ([0-9a-f]{2} ){4} *       es push \$0x[a-f0-9]+
> +[      ]*[a-f0-9]+:    26 40 68 ([0-9a-f]{2} ){4} *    es rex push \$0x[a-f0-9]+
> +#pass
> --- /dev/null
> +++ b/ld/testsuite/ld-x86-64/load5b.d
> @@ -0,0 +1,15 @@
> +#source: load5.s
> +#as: --64 -mrelax-relocations=yes
> +#ld: -shared -melf_x86_64
> +#objdump: -dw
> +
> +.*: +file format .*
> +
> +
> +Disassembly of section .text:
> +
> +#...
> +[a-f0-9]+ <_start>:
> +[      ]*[a-f0-9]+:    ff 35 ([0-9a-f]{2} ){4} *       push +0x[a-f0-9]+\(%rip\)        # [a-f0-9]+ <.*>
> +[      ]*[a-f0-9]+:    40 ff 35 ([0-9a-f]{2} ){4} *    rex push 0x[a-f0-9]+\(%rip\)        # [a-f0-9]+ <.*>
> +#pass
> --- a/ld/testsuite/ld-x86-64/x86-64.exp
> +++ b/ld/testsuite/ld-x86-64/x86-64.exp
> @@ -654,6 +654,8 @@ run_dump_test "load2"
>  run_dump_test "load3a"
>  run_dump_test "load3b"
>  run_dump_test "load4"
> +run_dump_test "load5a"
> +run_dump_test "load5b"
>  run_dump_test "call1a"
>  run_dump_test "call1b"
>  run_dump_test "call1c"
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL
  2025-02-03 22:40   ` H.J. Lu
@ 2025-02-04 10:14     ` Jan Beulich
  2025-02-04 10:17       ` H.J. Lu
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2025-02-04 10:14 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 03.02.2025 23:40, H.J. Lu wrote:
> On Mon, Feb 3, 2025 at 7:41 PM Jan Beulich <jbeulich@suse.com> wrote:
>> --- /dev/null
>> +++ b/ld/testsuite/ld-i386/load8a.d
>> @@ -0,0 +1,14 @@
>> +#source: load8.s
>> +#as: --32 -mrelax-relocations=yes
>> +#ld: -melf_i386 -z noseparate-code
>> +#objdump: -dw
>> +
>> +.*: +file format .*
>> +
>> +Disassembly of section .text:
>> +
>> +0+8048074 <_start>:
>> +[      ]*[a-f0-9]+:    26 68 86 90 04 08       es push \$0x8049086
>> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
>> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
> 
> Please avoid adding the es prefix.  It may not be nop in the future.

Constructive comments please. What other prefix do you suggest we use?
Is another of the segment overrides okay? If not, all that's left is an
address size override, if I'm not mistaken. Which overall seems less
desirable to use.

Plus - is your concern only about 32-bit code, or also about 64-bit? For
32-bit code in particular I'm having difficulty seeing why an ES
prefix might gain new meaning going forward, when an increasing number
of ISA extensions are for 64-bit mode only anyway. If the concern
extends to 64-bit code (it shouldn't, as the pre-386 segment overrides
are documented as nop prefixes, if I'm not mistaken), earlier changes
would need adjusting then, too, I think.

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL
  2025-02-04 10:14     ` Jan Beulich
@ 2025-02-04 10:17       ` H.J. Lu
  2025-02-04 10:41         ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: H.J. Lu @ 2025-02-04 10:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Feb 4, 2025 at 6:14 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 03.02.2025 23:40, H.J. Lu wrote:
> > On Mon, Feb 3, 2025 at 7:41 PM Jan Beulich <jbeulich@suse.com> wrote:
> >> --- /dev/null
> >> +++ b/ld/testsuite/ld-i386/load8a.d
> >> @@ -0,0 +1,14 @@
> >> +#source: load8.s
> >> +#as: --32 -mrelax-relocations=yes
> >> +#ld: -melf_i386 -z noseparate-code
> >> +#objdump: -dw
> >> +
> >> +.*: +file format .*
> >> +
> >> +Disassembly of section .text:
> >> +
> >> +0+8048074 <_start>:
> >> +[      ]*[a-f0-9]+:    26 68 86 90 04 08       es push \$0x8049086
> >> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
> >> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
> >
> > Please avoid adding the es prefix.  It may not be nop in the future.
>
> Constructive comments please. What other prefix do you suggest we use?
> Is another of the segment overrides okay? If not, all that's left is an
> address size override, if I'm not mistaken. Which overall seems less
> desirable to use.

You can use 1-byte NOP.

> Plus - is your concern only about 32-bit code, or also about 64-bit? For
> 32-bit code in particular I'm having difficulty seeing why an ES
> prefix might gain new meaning going forward, when an increasing number
> of ISA extensions are for 64-bit mode only anyway. If the concern
> extends to 64-bit code (it shouldn't, as the pre-386 segment overrides
> are documented as nop prefixes, if I'm not mistaken), earlier changes
> would need adjusting then, too, I think.

I don't think adding instructions like PUSH is very useful.

-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL
  2025-02-04 10:17       ` H.J. Lu
@ 2025-02-04 10:41         ` Jan Beulich
  2025-02-04 11:02           ` H.J. Lu
  2025-02-04 11:02           ` Jan Beulich
  0 siblings, 2 replies; 32+ messages in thread
From: Jan Beulich @ 2025-02-04 10:41 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 04.02.2025 11:17, H.J. Lu wrote:
> On Tue, Feb 4, 2025 at 6:14 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 03.02.2025 23:40, H.J. Lu wrote:
>>> On Mon, Feb 3, 2025 at 7:41 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>> --- /dev/null
>>>> +++ b/ld/testsuite/ld-i386/load8a.d
>>>> @@ -0,0 +1,14 @@
>>>> +#source: load8.s
>>>> +#as: --32 -mrelax-relocations=yes
>>>> +#ld: -melf_i386 -z noseparate-code
>>>> +#objdump: -dw
>>>> +
>>>> +.*: +file format .*
>>>> +
>>>> +Disassembly of section .text:
>>>> +
>>>> +0+8048074 <_start>:
>>>> +[      ]*[a-f0-9]+:    26 68 86 90 04 08       es push \$0x8049086
>>>> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
>>>> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
>>>
>>> Please avoid adding the es prefix.  It may not be nop in the future.
>>
>> Constructive comments please. What other prefix do you suggest we use?
>> Is another of the segment overrides okay? If not, all that's left is an
>> address size override, if I'm not mistaken. Which overall seems less
>> desirable to use.
> 
> You can use 1-byte NOP.

That'll have an undue effect on debugging, by splitting a single insn
into two.

>> Plus - is your concern only about 32-bit code, or also about 64-bit? For
>> 32-bit code in particular I'm having difficulty seeing why an ES
>> prefix might gain new meaning going forward, when an increasing number
>> of ISA extensions are for 64-bit mode only anyway. If the concern
>> extends to 64-bit code (it shouldn't, as the pre-386 segment overrides
>> are documented as nop prefixes, if I'm not mistaken), earlier changes
>> would need adjusting then, too, I think.
> 
> I don't think adding instructions like PUSH is very useful.

I was actively waiting for this kind of comment. Why was adding support
for e.g. ADC and SBB useful then? Imo it can only be one of two ways:
Either we support everything that can be supported, or we limit things
strictly to cases that are actively deemed useful.

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL
  2025-02-04 10:41         ` Jan Beulich
@ 2025-02-04 11:02           ` H.J. Lu
  2025-02-04 11:02           ` Jan Beulich
  1 sibling, 0 replies; 32+ messages in thread
From: H.J. Lu @ 2025-02-04 11:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Feb 4, 2025 at 6:41 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 04.02.2025 11:17, H.J. Lu wrote:
> > On Tue, Feb 4, 2025 at 6:14 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 03.02.2025 23:40, H.J. Lu wrote:
> >>> On Mon, Feb 3, 2025 at 7:41 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>> --- /dev/null
> >>>> +++ b/ld/testsuite/ld-i386/load8a.d
> >>>> @@ -0,0 +1,14 @@
> >>>> +#source: load8.s
> >>>> +#as: --32 -mrelax-relocations=yes
> >>>> +#ld: -melf_i386 -z noseparate-code
> >>>> +#objdump: -dw
> >>>> +
> >>>> +.*: +file format .*
> >>>> +
> >>>> +Disassembly of section .text:
> >>>> +
> >>>> +0+8048074 <_start>:
> >>>> +[      ]*[a-f0-9]+:    26 68 86 90 04 08       es push \$0x8049086
> >>>> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
> >>>> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
> >>>
> >>> Please avoid adding the es prefix.  It may not be nop in the future.
> >>
> >> Constructive comments please. What other prefix do you suggest we use?
> >> Is another of the segment overrides okay? If not, all that's left is an
> >> address size override, if I'm not mistaken. Which overall seems less
> >> desirable to use.
> >
> > You can use 1-byte NOP.
>
> That'll have an undue effect on debugging, by splitting a single insn
> into two.
>
> >> Plus - is your concern only about 32-bit code, or also about 64-bit? For
> >> 32-bit code in particular I'm having difficulty seeing why an ES
> >> prefix might gain new meaning going forward, when an increasing number
> >> of ISA extensions are for 64-bit mode only anyway. If the concern
> >> extends to 64-bit code (it shouldn't, as the pre-386 segment overrides
> >> are documented as nop prefixes, if I'm not mistaken), earlier changes
> >> would need adjusting then, too, I think.
> >
> > I don't think adding instructions like PUSH is very useful.
>
> I was actively waiting for this kind of comment. Why was adding support
> for e.g. ADC and SBB useful then? Imo it can only be one of two ways:

Good question.  They were included since they have the same operand
encoding as ADD and they can be transformed together with ADD.

> Either we support everything that can be supported, or we limit things
> strictly to cases that are actively deemed useful.
>
> Jan



-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL
  2025-02-04 10:41         ` Jan Beulich
  2025-02-04 11:02           ` H.J. Lu
@ 2025-02-04 11:02           ` Jan Beulich
  2025-02-04 11:12             ` H.J. Lu
  1 sibling, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2025-02-04 11:02 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 04.02.2025 11:41, Jan Beulich wrote:
> On 04.02.2025 11:17, H.J. Lu wrote:
>> On Tue, Feb 4, 2025 at 6:14 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>
>>> On 03.02.2025 23:40, H.J. Lu wrote:
>>>> On Mon, Feb 3, 2025 at 7:41 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>> --- /dev/null
>>>>> +++ b/ld/testsuite/ld-i386/load8a.d
>>>>> @@ -0,0 +1,14 @@
>>>>> +#source: load8.s
>>>>> +#as: --32 -mrelax-relocations=yes
>>>>> +#ld: -melf_i386 -z noseparate-code
>>>>> +#objdump: -dw
>>>>> +
>>>>> +.*: +file format .*
>>>>> +
>>>>> +Disassembly of section .text:
>>>>> +
>>>>> +0+8048074 <_start>:
>>>>> +[      ]*[a-f0-9]+:    26 68 86 90 04 08       es push \$0x8049086
>>>>> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
>>>>> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
>>>>
>>>> Please avoid adding the es prefix.  It may not be nop in the future.
>>>
>>> Constructive comments please. What other prefix do you suggest we use?
>>> Is another of the segment overrides okay? If not, all that's left is an
>>> address size override, if I'm not mistaken. Which overall seems less
>>> desirable to use.
>>
>> You can use 1-byte NOP.
> 
> That'll have an undue effect on debugging, by splitting a single insn
> into two.
> 
>>> Plus - is your concern only about 32-bit code, or also about 64-bit? For
>>> 32-bit code in particular I'm having difficulty seeing why an ES
>>> prefix might gain new meaning going forward, when an increasing number
>>> of ISA extensions are for 64-bit mode only anyway. If the concern
>>> extends to 64-bit code (it shouldn't, as the pre-386 segment overrides
>>> are documented as nop prefixes, if I'm not mistaken), earlier changes
>>> would need adjusting then, too, I think.
>>
>> I don't think adding instructions like PUSH is very useful.

Further to my earlier reply: You didn't answer my question. Which is
necessary to determine whether earlier changes need adjustment.

> I was actively waiting for this kind of comment. Why was adding support
> for e.g. ADC and SBB useful then? Imo it can only be one of two ways:
> Either we support everything that can be supported, or we limit things
> strictly to cases that are actively deemed useful.

Thinking of it: With TEST being special-cased in the logic involved, I'm
also curious to learn of a code sequence where TEST would sensibly be
used (and where CMP can't be used instead).

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL
  2025-02-04 11:02           ` Jan Beulich
@ 2025-02-04 11:12             ` H.J. Lu
  2025-02-04 11:16               ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: H.J. Lu @ 2025-02-04 11:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Feb 4, 2025 at 7:02 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 04.02.2025 11:41, Jan Beulich wrote:
> > On 04.02.2025 11:17, H.J. Lu wrote:
> >> On Tue, Feb 4, 2025 at 6:14 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>
> >>> On 03.02.2025 23:40, H.J. Lu wrote:
> >>>> On Mon, Feb 3, 2025 at 7:41 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>> --- /dev/null
> >>>>> +++ b/ld/testsuite/ld-i386/load8a.d
> >>>>> @@ -0,0 +1,14 @@
> >>>>> +#source: load8.s
> >>>>> +#as: --32 -mrelax-relocations=yes
> >>>>> +#ld: -melf_i386 -z noseparate-code
> >>>>> +#objdump: -dw
> >>>>> +
> >>>>> +.*: +file format .*
> >>>>> +
> >>>>> +Disassembly of section .text:
> >>>>> +
> >>>>> +0+8048074 <_start>:
> >>>>> +[      ]*[a-f0-9]+:    26 68 86 90 04 08       es push \$0x8049086
> >>>>> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
> >>>>> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
> >>>>
> >>>> Please avoid adding the es prefix.  It may not be nop in the future.
> >>>
> >>> Constructive comments please. What other prefix do you suggest we use?
> >>> Is another of the segment overrides okay? If not, all that's left is an
> >>> address size override, if I'm not mistaken. Which overall seems less
> >>> desirable to use.
> >>
> >> You can use 1-byte NOP.
> >
> > That'll have an undue effect on debugging, by splitting a single insn
> > into two.
> >
> >>> Plus - is your concern only about 32-bit code, or also about 64-bit? For
> >>> 32-bit code in particular I'm having difficulty seeing why an ES
> >>> prefix might gain new meaning going forward, when an increasing number
> >>> of ISA extensions are for 64-bit mode only anyway. If the concern
> >>> extends to 64-bit code (it shouldn't, as the pre-386 segment overrides
> >>> are documented as nop prefixes, if I'm not mistaken), earlier changes
> >>> would need adjusting then, too, I think.
> >>
> >> I don't think adding instructions like PUSH is very useful.
>
> Further to my earlier reply: You didn't answer my question. Which is

Which question?

> necessary to determine whether earlier changes need adjustment.
>
> > I was actively waiting for this kind of comment. Why was adding support
> > for e.g. ADC and SBB useful then? Imo it can only be one of two ways:
> > Either we support everything that can be supported, or we limit things
> > strictly to cases that are actively deemed useful.
>
> Thinking of it: With TEST being special-cased in the logic involved, I'm
> also curious to learn of a code sequence where TEST would sensibly be
> used (and where CMP can't be used instead).
>

Compiler may generate TEST with GOT.


-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL
  2025-02-04 11:12             ` H.J. Lu
@ 2025-02-04 11:16               ` Jan Beulich
  2025-02-04 12:14                 ` H.J. Lu
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2025-02-04 11:16 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 04.02.2025 12:12, H.J. Lu wrote:
> On Tue, Feb 4, 2025 at 7:02 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 04.02.2025 11:41, Jan Beulich wrote:
>>> On 04.02.2025 11:17, H.J. Lu wrote:
>>>> On Tue, Feb 4, 2025 at 6:14 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>
>>>>> On 03.02.2025 23:40, H.J. Lu wrote:
>>>>>> On Mon, Feb 3, 2025 at 7:41 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>> --- /dev/null
>>>>>>> +++ b/ld/testsuite/ld-i386/load8a.d
>>>>>>> @@ -0,0 +1,14 @@
>>>>>>> +#source: load8.s
>>>>>>> +#as: --32 -mrelax-relocations=yes
>>>>>>> +#ld: -melf_i386 -z noseparate-code
>>>>>>> +#objdump: -dw
>>>>>>> +
>>>>>>> +.*: +file format .*
>>>>>>> +
>>>>>>> +Disassembly of section .text:
>>>>>>> +
>>>>>>> +0+8048074 <_start>:
>>>>>>> +[      ]*[a-f0-9]+:    26 68 86 90 04 08       es push \$0x8049086
>>>>>>> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
>>>>>>> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
>>>>>>
>>>>>> Please avoid adding the es prefix.  It may not be nop in the future.
>>>>>
>>>>> Constructive comments please. What other prefix do you suggest we use?
>>>>> Is another of the segment overrides okay? If not, all that's left is an
>>>>> address size override, if I'm not mistaken. Which overall seems less
>>>>> desirable to use.
>>>>
>>>> You can use 1-byte NOP.
>>>
>>> That'll have an undue effect on debugging, by splitting a single insn
>>> into two.
>>>
>>>>> Plus - is your concern only about 32-bit code, or also about 64-bit? For
>>>>> 32-bit code in particular I'm having difficulty seeing why an ES
>>>>> prefix might gain new meaning going forward, when an increasing number
>>>>> of ISA extensions are for 64-bit mode only anyway. If the concern
>>>>> extends to 64-bit code (it shouldn't, as the pre-386 segment overrides
>>>>> are documented as nop prefixes, if I'm not mistaken), earlier changes
>>>>> would need adjusting then, too, I think.
>>>>
>>>> I don't think adding instructions like PUSH is very useful.
>>
>> Further to my earlier reply: You didn't answer my question. Which is
> 
> Which question?

"Is your concern only about 32-bit code, or also about 64-bit?"

>> necessary to determine whether earlier changes need adjustment.
>>
>>> I was actively waiting for this kind of comment. Why was adding support
>>> for e.g. ADC and SBB useful then? Imo it can only be one of two ways:
>>> Either we support everything that can be supported, or we limit things
>>> strictly to cases that are actively deemed useful.
>>
>> Thinking of it: With TEST being special-cased in the logic involved, I'm
>> also curious to learn of a code sequence where TEST would sensibly be
>> used (and where CMP can't be used instead).
> 
> Compiler may generate TEST with GOT.

Can it? Why would it? (IOW: Again you didn't really address my request.)

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL
  2025-02-04 11:16               ` Jan Beulich
@ 2025-02-04 12:14                 ` H.J. Lu
  2025-02-05 11:37                   ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: H.J. Lu @ 2025-02-04 12:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Feb 4, 2025 at 7:16 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 04.02.2025 12:12, H.J. Lu wrote:
> > On Tue, Feb 4, 2025 at 7:02 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 04.02.2025 11:41, Jan Beulich wrote:
> >>> On 04.02.2025 11:17, H.J. Lu wrote:
> >>>> On Tue, Feb 4, 2025 at 6:14 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>
> >>>>> On 03.02.2025 23:40, H.J. Lu wrote:
> >>>>>> On Mon, Feb 3, 2025 at 7:41 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>> --- /dev/null
> >>>>>>> +++ b/ld/testsuite/ld-i386/load8a.d
> >>>>>>> @@ -0,0 +1,14 @@
> >>>>>>> +#source: load8.s
> >>>>>>> +#as: --32 -mrelax-relocations=yes
> >>>>>>> +#ld: -melf_i386 -z noseparate-code
> >>>>>>> +#objdump: -dw
> >>>>>>> +
> >>>>>>> +.*: +file format .*
> >>>>>>> +
> >>>>>>> +Disassembly of section .text:
> >>>>>>> +
> >>>>>>> +0+8048074 <_start>:
> >>>>>>> +[      ]*[a-f0-9]+:    26 68 86 90 04 08       es push \$0x8049086
> >>>>>>> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
> >>>>>>> +[      ]*[a-f0-9]+:    26 68 87 90 04 08       es push \$0x8049087
> >>>>>>
> >>>>>> Please avoid adding the es prefix.  It may not be nop in the future.
> >>>>>
> >>>>> Constructive comments please. What other prefix do you suggest we use?
> >>>>> Is another of the segment overrides okay? If not, all that's left is an
> >>>>> address size override, if I'm not mistaken. Which overall seems less
> >>>>> desirable to use.
> >>>>
> >>>> You can use 1-byte NOP.
> >>>
> >>> That'll have an undue effect on debugging, by splitting a single insn
> >>> into two.
> >>>
> >>>>> Plus - is your concern only about 32-bit code, or also about 64-bit? For
> >>>>> 32-bit code in particular I'm having difficulty seeing why an ES
> >>>>> prefix might gain new meaning going forward, when an increasing number
> >>>>> of ISA extensions are for 64-bit mode only anyway. If the concern
> >>>>> extends to 64-bit code (it shouldn't, as the pre-386 segment overrides
> >>>>> are documented as nop prefixes, if I'm not mistaken), earlier changes
> >>>>> would need adjusting then, too, I think.
> >>>>
> >>>> I don't think adding instructions like PUSH is very useful.
> >>
> >> Further to my earlier reply: You didn't answer my question. Which is
> >
> > Which question?
>
> "Is your concern only about 32-bit code, or also about 64-bit?"

Both.

> >> necessary to determine whether earlier changes need adjustment.
> >>
> >>> I was actively waiting for this kind of comment. Why was adding support
> >>> for e.g. ADC and SBB useful then? Imo it can only be one of two ways:
> >>> Either we support everything that can be supported, or we limit things
> >>> strictly to cases that are actively deemed useful.
> >>
> >> Thinking of it: With TEST being special-cased in the logic involved, I'm
> >> also curious to learn of a code sequence where TEST would sensibly be
> >> used (and where CMP can't be used instead).
> >
> > Compiler may generate TEST with GOT.
>
> Can it? Why would it? (IOW: Again you didn't really address my request.)
>

[hjl@gnu-tgl-3 tmp]$ cat x.c
extern int foo __attribute__ ((weak));

extern void bar (void);

__attribute__ ((regparm(3)))
void
_start (long int p)
{
  if (((unsigned long) &foo) & p)
    bar ();
}
[hjl@gnu-tgl-3 tmp]$ gcc -c -O2 -m32 x.c -mno-direct-extern-access

[hjl@gnu-tgl-3 tmp]$ objdump -dwr x.o

x.o:     file format elf32-i386


Disassembly of section .text:

00000000 <_start>:
   0: 85 05 00 00 00 00    test   %eax,0x0 2: R_386_GOT32X foo
   6: 75 08                jne    10 <_start+0x10>
   8: c3                    ret
   9: 8d b4 26 00 00 00 00 lea    0x0(%esi,%eiz,1),%esi
  10: e9 fc ff ff ff        jmp    11 <_start+0x11> 11: R_386_PC32 bar
[hjl@gnu-tgl-3 tmp]$

Which request?

-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL
  2025-02-04 12:14                 ` H.J. Lu
@ 2025-02-05 11:37                   ` Jan Beulich
  2025-02-06  2:28                     ` H.J. Lu
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2025-02-05 11:37 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 04.02.2025 13:14, H.J. Lu wrote:
> On Tue, Feb 4, 2025 at 7:16 PM Jan Beulich <jbeulich@suse.com> wrote:
>> On 04.02.2025 12:12, H.J. Lu wrote:
>>> On Tue, Feb 4, 2025 at 7:02 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>> Thinking of it: With TEST being special-cased in the logic involved, I'm
>>>> also curious to learn of a code sequence where TEST would sensibly be
>>>> used (and where CMP can't be used instead).
>>>
>>> Compiler may generate TEST with GOT.
>>
>> Can it? Why would it? (IOW: Again you didn't really address my request.)
>>
> 
> [hjl@gnu-tgl-3 tmp]$ cat x.c
> extern int foo __attribute__ ((weak));
> 
> extern void bar (void);
> 
> __attribute__ ((regparm(3)))
> void
> _start (long int p)
> {
>   if (((unsigned long) &foo) & p)
>     bar ();
> }
> [hjl@gnu-tgl-3 tmp]$ gcc -c -O2 -m32 x.c -mno-direct-extern-access
> 
> [hjl@gnu-tgl-3 tmp]$ objdump -dwr x.o
> 
> x.o:     file format elf32-i386
> 
> 
> Disassembly of section .text:
> 
> 00000000 <_start>:
>    0: 85 05 00 00 00 00    test   %eax,0x0 2: R_386_GOT32X foo
>    6: 75 08                jne    10 <_start+0x10>
>    8: c3                    ret
>    9: 8d b4 26 00 00 00 00 lea    0x0(%esi,%eiz,1),%esi
>   10: e9 fc ff ff ff        jmp    11 <_start+0x11> 11: R_386_PC32 bar
> [hjl@gnu-tgl-3 tmp]$
> 
> Which request?

"... learn of a code sequence where TEST would sensibly be used" in my
earlier reply. So yes, you provided a contrived example now. I'm afraid
I can't assign meaning to "if (((unsigned long) &foo) & p)", though.
Instead, slightly adjusting your example, I can see PUSH being used by
the compiler in exactly the way we can (but so far didn't) optimize.
That's ordinary argument passing, which I'm inclined to call "sensible".

IOW the question remains as to why optimization of TEST was added,
beyond the "because we can" reasoning. As before, ADD, SUB, and perhaps
CMP make sense when operating on pointers. For AND, OR, XOR, ADC, SBB,
and TEST I'm having a hard time seeing good uses. Since we deal with
them, we also ought to deal with IMUL in the same manner (where
possible). Since beyond these arithmetic ops we also deal with MOV, we
similarly should handle PUSH.

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL
  2025-02-05 11:37                   ` Jan Beulich
@ 2025-02-06  2:28                     ` H.J. Lu
  2025-02-06 12:08                       ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: H.J. Lu @ 2025-02-06  2:28 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Wed, Feb 5, 2025 at 7:37 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 04.02.2025 13:14, H.J. Lu wrote:
> > On Tue, Feb 4, 2025 at 7:16 PM Jan Beulich <jbeulich@suse.com> wrote:
> >> On 04.02.2025 12:12, H.J. Lu wrote:
> >>> On Tue, Feb 4, 2025 at 7:02 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>> Thinking of it: With TEST being special-cased in the logic involved, I'm
> >>>> also curious to learn of a code sequence where TEST would sensibly be
> >>>> used (and where CMP can't be used instead).
> >>>
> >>> Compiler may generate TEST with GOT.
> >>
> >> Can it? Why would it? (IOW: Again you didn't really address my request.)
> >>
> >
> > [hjl@gnu-tgl-3 tmp]$ cat x.c
> > extern int foo __attribute__ ((weak));
> >
> > extern void bar (void);
> >
> > __attribute__ ((regparm(3)))
> > void
> > _start (long int p)
> > {
> >   if (((unsigned long) &foo) & p)
> >     bar ();
> > }
> > [hjl@gnu-tgl-3 tmp]$ gcc -c -O2 -m32 x.c -mno-direct-extern-access
> >
> > [hjl@gnu-tgl-3 tmp]$ objdump -dwr x.o
> >
> > x.o:     file format elf32-i386
> >
> >
> > Disassembly of section .text:
> >
> > 00000000 <_start>:
> >    0: 85 05 00 00 00 00    test   %eax,0x0 2: R_386_GOT32X foo
> >    6: 75 08                jne    10 <_start+0x10>
> >    8: c3                    ret
> >    9: 8d b4 26 00 00 00 00 lea    0x0(%esi,%eiz,1),%esi
> >   10: e9 fc ff ff ff        jmp    11 <_start+0x11> 11: R_386_PC32 bar
> > [hjl@gnu-tgl-3 tmp]$
> >
> > Which request?
>
> "... learn of a code sequence where TEST would sensibly be used" in my
> earlier reply. So yes, you provided a contrived example now. I'm afraid
> I can't assign meaning to "if (((unsigned long) &foo) & p)", though.

It can be used to check pointer alignment.

> Instead, slightly adjusting your example, I can see PUSH being used by
> the compiler in exactly the way we can (but so far didn't) optimize.
> That's ordinary argument passing, which I'm inclined to call "sensible".

Please don't add a segment register.

> IOW the question remains as to why optimization of TEST was added,
> beyond the "because we can" reasoning. As before, ADD, SUB, and perhaps
> CMP make sense when operating on pointers. For AND, OR, XOR, ADC, SBB,
> and TEST I'm having a hard time seeing good uses. Since we deal with
> them, we also ought to deal with IMUL in the same manner (where
> possible). Since beyond these arithmetic ops we also deal with MOV, we
> similarly should handle PUSH.
>
> Jan



-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL
  2025-02-06  2:28                     ` H.J. Lu
@ 2025-02-06 12:08                       ` Jan Beulich
  2025-02-07  5:27                         ` H.J. Lu
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2025-02-06 12:08 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 06.02.2025 03:28, H.J. Lu wrote:
> On Wed, Feb 5, 2025 at 7:37 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 04.02.2025 13:14, H.J. Lu wrote:
>>> On Tue, Feb 4, 2025 at 7:16 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>> On 04.02.2025 12:12, H.J. Lu wrote:
>>>>> On Tue, Feb 4, 2025 at 7:02 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>> Thinking of it: With TEST being special-cased in the logic involved, I'm
>>>>>> also curious to learn of a code sequence where TEST would sensibly be
>>>>>> used (and where CMP can't be used instead).
>>>>>
>>>>> Compiler may generate TEST with GOT.
>>>>
>>>> Can it? Why would it? (IOW: Again you didn't really address my request.)
>>>>
>>>
>>> [hjl@gnu-tgl-3 tmp]$ cat x.c
>>> extern int foo __attribute__ ((weak));
>>>
>>> extern void bar (void);
>>>
>>> __attribute__ ((regparm(3)))
>>> void
>>> _start (long int p)
>>> {
>>>   if (((unsigned long) &foo) & p)
>>>     bar ();
>>> }
>>> [hjl@gnu-tgl-3 tmp]$ gcc -c -O2 -m32 x.c -mno-direct-extern-access
>>>
>>> [hjl@gnu-tgl-3 tmp]$ objdump -dwr x.o
>>>
>>> x.o:     file format elf32-i386
>>>
>>>
>>> Disassembly of section .text:
>>>
>>> 00000000 <_start>:
>>>    0: 85 05 00 00 00 00    test   %eax,0x0 2: R_386_GOT32X foo
>>>    6: 75 08                jne    10 <_start+0x10>
>>>    8: c3                    ret
>>>    9: 8d b4 26 00 00 00 00 lea    0x0(%esi,%eiz,1),%esi
>>>   10: e9 fc ff ff ff        jmp    11 <_start+0x11> 11: R_386_PC32 bar
>>> [hjl@gnu-tgl-3 tmp]$
>>>
>>> Which request?
>>
>> "... learn of a code sequence where TEST would sensibly be used" in my
>> earlier reply. So yes, you provided a contrived example now. I'm afraid
>> I can't assign meaning to "if (((unsigned long) &foo) & p)", though.
> 
> It can be used to check pointer alignment.

Hmm, yes, I didn't think of that. Yet then TESTB (and perhaps even TESTW)
would also want to be supported.

>> Instead, slightly adjusting your example, I can see PUSH being used by
>> the compiler in exactly the way we can (but so far didn't) optimize.
>> That's ordinary argument passing, which I'm inclined to call "sensible".
> 
> Please don't add a segment register.

Actually I have to re-raise the why question: We use CS: in some of the
multi-byte NOPs we emit, both for 32- and for 64-bit.

Plus, as said before, I view it as pretty undesirable to split a single
insn into two (NOP of appropriate size followed by shrunk insn).

Alternatively for 32-bit's PUSH we could use an address size override,
albeit this doesn't look very desirable either. For 64-bit we could use
dummy REX prefixes, unless we're dealing with a REX2 encoding. For REX2
forms of PUSH we could either choose to not use REX2, or again use an
address override. An address size override could also be used for
immediate-to-register MOV.

Also: If anything, these prefixes could only gain hint-like meaning
anyway (like two of them once did for conditional jumps), so there would
not be any functional issue. As for their use in multi-byte NOPs, we can
deal with eventual performance issues once we know how bad things are.

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL
  2025-02-06 12:08                       ` Jan Beulich
@ 2025-02-07  5:27                         ` H.J. Lu
  0 siblings, 0 replies; 32+ messages in thread
From: H.J. Lu @ 2025-02-07  5:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Thu, Feb 6, 2025 at 8:08 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 06.02.2025 03:28, H.J. Lu wrote:
> > On Wed, Feb 5, 2025 at 7:37 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 04.02.2025 13:14, H.J. Lu wrote:
> >>> On Tue, Feb 4, 2025 at 7:16 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>> On 04.02.2025 12:12, H.J. Lu wrote:
> >>>>> On Tue, Feb 4, 2025 at 7:02 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>> Thinking of it: With TEST being special-cased in the logic involved, I'm
> >>>>>> also curious to learn of a code sequence where TEST would sensibly be
> >>>>>> used (and where CMP can't be used instead).
> >>>>>
> >>>>> Compiler may generate TEST with GOT.
> >>>>
> >>>> Can it? Why would it? (IOW: Again you didn't really address my request.)
> >>>>
> >>>
> >>> [hjl@gnu-tgl-3 tmp]$ cat x.c
> >>> extern int foo __attribute__ ((weak));
> >>>
> >>> extern void bar (void);
> >>>
> >>> __attribute__ ((regparm(3)))
> >>> void
> >>> _start (long int p)
> >>> {
> >>>   if (((unsigned long) &foo) & p)
> >>>     bar ();
> >>> }
> >>> [hjl@gnu-tgl-3 tmp]$ gcc -c -O2 -m32 x.c -mno-direct-extern-access
> >>>
> >>> [hjl@gnu-tgl-3 tmp]$ objdump -dwr x.o
> >>>
> >>> x.o:     file format elf32-i386
> >>>
> >>>
> >>> Disassembly of section .text:
> >>>
> >>> 00000000 <_start>:
> >>>    0: 85 05 00 00 00 00    test   %eax,0x0 2: R_386_GOT32X foo
> >>>    6: 75 08                jne    10 <_start+0x10>
> >>>    8: c3                    ret
> >>>    9: 8d b4 26 00 00 00 00 lea    0x0(%esi,%eiz,1),%esi
> >>>   10: e9 fc ff ff ff        jmp    11 <_start+0x11> 11: R_386_PC32 bar
> >>> [hjl@gnu-tgl-3 tmp]$
> >>>
> >>> Which request?
> >>
> >> "... learn of a code sequence where TEST would sensibly be used" in my
> >> earlier reply. So yes, you provided a contrived example now. I'm afraid
> >> I can't assign meaning to "if (((unsigned long) &foo) & p)", though.
> >
> > It can be used to check pointer alignment.
>
> Hmm, yes, I didn't think of that. Yet then TESTB (and perhaps even TESTW)
> would also want to be supported.
>
> >> Instead, slightly adjusting your example, I can see PUSH being used by
> >> the compiler in exactly the way we can (but so far didn't) optimize.
> >> That's ordinary argument passing, which I'm inclined to call "sensible".
> >
> > Please don't add a segment register.
>
> Actually I have to re-raise the why question: We use CS: in some of the
> multi-byte NOPs we emit, both for 32- and for 64-bit.

CS is better than ES.

> Plus, as said before, I view it as pretty undesirable to split a single
> insn into two (NOP of appropriate size followed by shrunk insn).
>
> Alternatively for 32-bit's PUSH we could use an address size override,
> albeit this doesn't look very desirable either. For 64-bit we could use
> dummy REX prefixes, unless we're dealing with a REX2 encoding. For REX2
> forms of PUSH we could either choose to not use REX2, or again use an
> address override. An address size override could also be used for
> immediate-to-register MOV.
>
> Also: If anything, these prefixes could only gain hint-like meaning
> anyway (like two of them once did for conditional jumps), so there would
> not be any functional issue. As for their use in multi-byte NOPs, we can
> deal with eventual performance issues once we know how bad things are.
>
> Jan



-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 4/5] x86-64: further tighten convert-load-reloc checking
  2025-02-03 11:39 [PATCH 0/5] x86: further GOT{,PCREL} related adjustments Jan Beulich
                   ` (2 preceding siblings ...)
  2025-02-03 11:41 ` [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL Jan Beulich
@ 2025-02-03 11:41 ` Jan Beulich
  2025-02-03 22:41   ` H.J. Lu
  2025-02-03 11:42 ` [PATCH 5/5] ix86: restrict use of GOT32X relocs Jan Beulich
  4 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2025-02-03 11:41 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

REX2.M affects what insn we're actually dealing with, so we better check
this to avoid transforming (future) insns we must not touch.

--- a/bfd/elf64-x86-64.c
+++ b/bfd/elf64-x86-64.c
@@ -2282,14 +2282,14 @@ elf_x86_64_convert_load_reloc (bfd *abfd
 	  if (to_reloc_pc32)
 	    return true;
 
-	  if (opcode == 0x85)
+	  if (opcode == 0x85 && !(rex2 & (REX2_M << 4)))
 	    {
 	      /* Convert "test %reg, foo@GOTPCREL(%rip)" to
 		 "test $foo, %reg".  */
 	      modrm = 0xc0 | (modrm & 0x38) >> 3;
 	      opcode = 0xf7;
 	    }
-	  else if ((opcode | 0x38) == 0x3b)
+	  else if ((opcode | 0x38) == 0x3b && !(rex2 & (REX2_M << 4)))
 	    {
 	      /* Convert "binop foo@GOTPCREL(%rip), %reg" to
 		 "binop $foo, %reg".  */


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/5] x86-64: further tighten convert-load-reloc checking
  2025-02-03 11:41 ` [PATCH 4/5] x86-64: further tighten convert-load-reloc checking Jan Beulich
@ 2025-02-03 22:41   ` H.J. Lu
  2025-02-04 10:04     ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: H.J. Lu @ 2025-02-03 22:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Mon, Feb 3, 2025 at 7:41 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> REX2.M affects what insn we're actually dealing with, so we better check
> this to avoid transforming (future) insns we must not touch.
>
> --- a/bfd/elf64-x86-64.c
> +++ b/bfd/elf64-x86-64.c
> @@ -2282,14 +2282,14 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>           if (to_reloc_pc32)
>             return true;
>
> -         if (opcode == 0x85)
> +         if (opcode == 0x85 && !(rex2 & (REX2_M << 4)))
>             {
>               /* Convert "test %reg, foo@GOTPCREL(%rip)" to
>                  "test $foo, %reg".  */
>               modrm = 0xc0 | (modrm & 0x38) >> 3;
>               opcode = 0xf7;
>             }
> -         else if ((opcode | 0x38) == 0x3b)
> +         else if ((opcode | 0x38) == 0x3b && !(rex2 & (REX2_M << 4)))
>             {
>               /* Convert "binop foo@GOTPCREL(%rip), %reg" to
>                  "binop $foo, %reg".  */
>

Please add a testcase to show it makes a difference.

Thanks.


-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/5] x86-64: further tighten convert-load-reloc checking
  2025-02-03 22:41   ` H.J. Lu
@ 2025-02-04 10:04     ` Jan Beulich
  2025-02-04 10:10       ` H.J. Lu
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2025-02-04 10:04 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 03.02.2025 23:41, H.J. Lu wrote:
> On Mon, Feb 3, 2025 at 7:41 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> REX2.M affects what insn we're actually dealing with, so we better check
>> this to avoid transforming (future) insns we must not touch.
>>
>> --- a/bfd/elf64-x86-64.c
>> +++ b/bfd/elf64-x86-64.c
>> @@ -2282,14 +2282,14 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>>           if (to_reloc_pc32)
>>             return true;
>>
>> -         if (opcode == 0x85)
>> +         if (opcode == 0x85 && !(rex2 & (REX2_M << 4)))
>>             {
>>               /* Convert "test %reg, foo@GOTPCREL(%rip)" to
>>                  "test $foo, %reg".  */
>>               modrm = 0xc0 | (modrm & 0x38) >> 3;
>>               opcode = 0xf7;
>>             }
>> -         else if ((opcode | 0x38) == 0x3b)
>> +         else if ((opcode | 0x38) == 0x3b && !(rex2 & (REX2_M << 4)))
>>             {
>>               /* Convert "binop foo@GOTPCREL(%rip), %reg" to
>>                  "binop $foo, %reg".  */
>>
> 
> Please add a testcase to show it makes a difference.

Hmm, not sure how such a testcase would look like. At least some of
the involved opcodes have no meaning (yet) with REX2. But maybe I
can construct something. Still I view it as unreasonable that such
obvious omissions in earlier changes can't be corrected without
investing a lot of time in trying to make up a situation where
things would fail. Once again: Proof of _no failure_ should have
been added when these optimizations were introduced. And that proof
should have been extended when APX support was added. What you're
effectively doing is to ask me to cover for earlier omissions.

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/5] x86-64: further tighten convert-load-reloc checking
  2025-02-04 10:04     ` Jan Beulich
@ 2025-02-04 10:10       ` H.J. Lu
  2025-02-04 10:17         ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: H.J. Lu @ 2025-02-04 10:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Feb 4, 2025 at 6:04 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 03.02.2025 23:41, H.J. Lu wrote:
> > On Mon, Feb 3, 2025 at 7:41 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> REX2.M affects what insn we're actually dealing with, so we better check
> >> this to avoid transforming (future) insns we must not touch.
> >>
> >> --- a/bfd/elf64-x86-64.c
> >> +++ b/bfd/elf64-x86-64.c
> >> @@ -2282,14 +2282,14 @@ elf_x86_64_convert_load_reloc (bfd *abfd
> >>           if (to_reloc_pc32)
> >>             return true;
> >>
> >> -         if (opcode == 0x85)
> >> +         if (opcode == 0x85 && !(rex2 & (REX2_M << 4)))
> >>             {
> >>               /* Convert "test %reg, foo@GOTPCREL(%rip)" to
> >>                  "test $foo, %reg".  */
> >>               modrm = 0xc0 | (modrm & 0x38) >> 3;
> >>               opcode = 0xf7;
> >>             }
> >> -         else if ((opcode | 0x38) == 0x3b)
> >> +         else if ((opcode | 0x38) == 0x3b && !(rex2 & (REX2_M << 4)))
> >>             {
> >>               /* Convert "binop foo@GOTPCREL(%rip), %reg" to
> >>                  "binop $foo, %reg".  */
> >>
> >
> > Please add a testcase to show it makes a difference.
>
> Hmm, not sure how such a testcase would look like. At least some of
> the involved opcodes have no meaning (yet) with REX2. But maybe I
> can construct something. Still I view it as unreasonable that such
> obvious omissions in earlier changes can't be corrected without
> investing a lot of time in trying to make up a situation where
> things would fail. Once again: Proof of _no failure_ should have
> been added when these optimizations were introduced. And that proof
> should have been extended when APX support was added. What you're
> effectively doing is to ask me to cover for earlier omissions.
>
> Jan

Again.  No test, no issue, no change.

-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/5] x86-64: further tighten convert-load-reloc checking
  2025-02-04 10:10       ` H.J. Lu
@ 2025-02-04 10:17         ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2025-02-04 10:17 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 04.02.2025 11:10, H.J. Lu wrote:
> On Tue, Feb 4, 2025 at 6:04 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 03.02.2025 23:41, H.J. Lu wrote:
>>> On Mon, Feb 3, 2025 at 7:41 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> REX2.M affects what insn we're actually dealing with, so we better check
>>>> this to avoid transforming (future) insns we must not touch.
>>>>
>>>> --- a/bfd/elf64-x86-64.c
>>>> +++ b/bfd/elf64-x86-64.c
>>>> @@ -2282,14 +2282,14 @@ elf_x86_64_convert_load_reloc (bfd *abfd
>>>>           if (to_reloc_pc32)
>>>>             return true;
>>>>
>>>> -         if (opcode == 0x85)
>>>> +         if (opcode == 0x85 && !(rex2 & (REX2_M << 4)))
>>>>             {
>>>>               /* Convert "test %reg, foo@GOTPCREL(%rip)" to
>>>>                  "test $foo, %reg".  */
>>>>               modrm = 0xc0 | (modrm & 0x38) >> 3;
>>>>               opcode = 0xf7;
>>>>             }
>>>> -         else if ((opcode | 0x38) == 0x3b)
>>>> +         else if ((opcode | 0x38) == 0x3b && !(rex2 & (REX2_M << 4)))
>>>>             {
>>>>               /* Convert "binop foo@GOTPCREL(%rip), %reg" to
>>>>                  "binop $foo, %reg".  */
>>>>
>>>
>>> Please add a testcase to show it makes a difference.
>>
>> Hmm, not sure how such a testcase would look like. At least some of
>> the involved opcodes have no meaning (yet) with REX2. But maybe I
>> can construct something. Still I view it as unreasonable that such
>> obvious omissions in earlier changes can't be corrected without
>> investing a lot of time in trying to make up a situation where
>> things would fail. Once again: Proof of _no failure_ should have
>> been added when these optimizations were introduced. And that proof
>> should have been extended when APX support was added. What you're
>> effectively doing is to ask me to cover for earlier omissions.
> 
> Again.  No test, no issue, no change.

Again - please adjust your attitude.

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 5/5] ix86: restrict use of GOT32X relocs
  2025-02-03 11:39 [PATCH 0/5] x86: further GOT{,PCREL} related adjustments Jan Beulich
                   ` (3 preceding siblings ...)
  2025-02-03 11:41 ` [PATCH 4/5] x86-64: further tighten convert-load-reloc checking Jan Beulich
@ 2025-02-03 11:42 ` Jan Beulich
  2025-02-03 22:41   ` H.J. Lu
  4 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2025-02-03 11:42 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

The linker rejects use of this reloc type without a base register for
PIC code. Suppress its use by gas in such cases.
---
The linker also rejects use of GOT32, but that's an issue the programmer
has to deal with. In the assembler we need to avoid doing something
wrong that the programmer has not explicitly asked for.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -12990,7 +12990,8 @@ output_disp (fragS *insn_start_frag, off
 			}
 		    }
 		  else if (generate_relax_relocations
-			   || (i.rm.mode == 0 && i.rm.regmem == 5))
+			   ? (!shared || i.rm.mode != 0 || i.rm.regmem != 5)
+			   : (!shared && i.rm.mode == 0 && i.rm.regmem == 5))
 		    fixP->fx_tcbit2 = 1;
 		}
 	    }


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/5] ix86: restrict use of GOT32X relocs
  2025-02-03 11:42 ` [PATCH 5/5] ix86: restrict use of GOT32X relocs Jan Beulich
@ 2025-02-03 22:41   ` H.J. Lu
  2025-02-04  7:24     ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: H.J. Lu @ 2025-02-03 22:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Mon, Feb 3, 2025 at 7:42 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> The linker rejects use of this reloc type without a base register for
> PIC code. Suppress its use by gas in such cases.
> ---
> The linker also rejects use of GOT32, but that's an issue the programmer
> has to deal with. In the assembler we need to avoid doing something
> wrong that the programmer has not explicitly asked for.
>
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -12990,7 +12990,8 @@ output_disp (fragS *insn_start_frag, off
>                         }
>                     }
>                   else if (generate_relax_relocations
> -                          || (i.rm.mode == 0 && i.rm.regmem == 5))
> +                          ? (!shared || i.rm.mode != 0 || i.rm.regmem != 5)
> +                          : (!shared && i.rm.mode == 0 && i.rm.regmem == 5))
>                     fixP->fx_tcbit2 = 1;
>                 }
>             }
>

Please add a testcase to show it makes a difference.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/5] ix86: restrict use of GOT32X relocs
  2025-02-03 22:41   ` H.J. Lu
@ 2025-02-04  7:24     ` Jan Beulich
  2025-02-04  7:27       ` H.J. Lu
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2025-02-04  7:24 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 03.02.2025 23:41, H.J. Lu wrote:
> On Mon, Feb 3, 2025 at 7:42 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> The linker rejects use of this reloc type without a base register for
>> PIC code. Suppress its use by gas in such cases.
>> ---
>> The linker also rejects use of GOT32, but that's an issue the programmer
>> has to deal with. In the assembler we need to avoid doing something
>> wrong that the programmer has not explicitly asked for.
>>
>> --- a/gas/config/tc-i386.c
>> +++ b/gas/config/tc-i386.c
>> @@ -12990,7 +12990,8 @@ output_disp (fragS *insn_start_frag, off
>>                         }
>>                     }
>>                   else if (generate_relax_relocations
>> -                          || (i.rm.mode == 0 && i.rm.regmem == 5))
>> +                          ? (!shared || i.rm.mode != 0 || i.rm.regmem != 5)
>> +                          : (!shared && i.rm.mode == 0 && i.rm.regmem == 5))
>>                     fixP->fx_tcbit2 = 1;
>>                 }
>>             }
> 
> Please add a testcase to show it makes a difference.

Such a testcase wouldn't fit my quality criteria, I'm afraid: It would
likely need to look for the specific diagnostic text, and that's what
I'm pretty sure you know I argue against when others try to add such
tests.

It'll also be in the linker testsuite, despite testing gas behavior.

All this said, I guess I'll think about making a testcase nevertheless.

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/5] ix86: restrict use of GOT32X relocs
  2025-02-04  7:24     ` Jan Beulich
@ 2025-02-04  7:27       ` H.J. Lu
  0 siblings, 0 replies; 32+ messages in thread
From: H.J. Lu @ 2025-02-04  7:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Feb 4, 2025 at 3:24 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 03.02.2025 23:41, H.J. Lu wrote:
> > On Mon, Feb 3, 2025 at 7:42 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> The linker rejects use of this reloc type without a base register for
> >> PIC code. Suppress its use by gas in such cases.
> >> ---
> >> The linker also rejects use of GOT32, but that's an issue the programmer
> >> has to deal with. In the assembler we need to avoid doing something
> >> wrong that the programmer has not explicitly asked for.
> >>
> >> --- a/gas/config/tc-i386.c
> >> +++ b/gas/config/tc-i386.c
> >> @@ -12990,7 +12990,8 @@ output_disp (fragS *insn_start_frag, off
> >>                         }
> >>                     }
> >>                   else if (generate_relax_relocations
> >> -                          || (i.rm.mode == 0 && i.rm.regmem == 5))
> >> +                          ? (!shared || i.rm.mode != 0 || i.rm.regmem != 5)
> >> +                          : (!shared && i.rm.mode == 0 && i.rm.regmem == 5))
> >>                     fixP->fx_tcbit2 = 1;
> >>                 }
> >>             }
> >
> > Please add a testcase to show it makes a difference.
>
> Such a testcase wouldn't fit my quality criteria, I'm afraid: It would
> likely need to look for the specific diagnostic text, and that's what
> I'm pretty sure you know I argue against when others try to add such
> tests.
>
> It'll also be in the linker testsuite, despite testing gas behavior.

That is true for this kind of changes.

> All this said, I guess I'll think about making a testcase nevertheless.
>
> Jan



-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2025-02-07  5:28 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-03 11:39 [PATCH 0/5] x86: further GOT{,PCREL} related adjustments Jan Beulich
2025-02-03 11:40 ` [PATCH 1/5] x86: drop redundant i.operands checks from output_disp() Jan Beulich
2025-02-03 11:40 ` [PATCH 2/5] ix86: tighten convert-load-reloc checking Jan Beulich
2025-02-03 22:34   ` H.J. Lu
2025-02-04  7:21     ` Jan Beulich
2025-02-04  7:26       ` H.J. Lu
2025-02-04  7:58         ` Jan Beulich
2025-02-04  8:03           ` H.J. Lu
2025-02-04  8:10             ` Jan Beulich
2025-02-03 11:41 ` [PATCH 3/5] x86: widen @got{,pcrel} support to PUSH and APX IMUL Jan Beulich
2025-02-03 22:40   ` H.J. Lu
2025-02-04 10:14     ` Jan Beulich
2025-02-04 10:17       ` H.J. Lu
2025-02-04 10:41         ` Jan Beulich
2025-02-04 11:02           ` H.J. Lu
2025-02-04 11:02           ` Jan Beulich
2025-02-04 11:12             ` H.J. Lu
2025-02-04 11:16               ` Jan Beulich
2025-02-04 12:14                 ` H.J. Lu
2025-02-05 11:37                   ` Jan Beulich
2025-02-06  2:28                     ` H.J. Lu
2025-02-06 12:08                       ` Jan Beulich
2025-02-07  5:27                         ` H.J. Lu
2025-02-03 11:41 ` [PATCH 4/5] x86-64: further tighten convert-load-reloc checking Jan Beulich
2025-02-03 22:41   ` H.J. Lu
2025-02-04 10:04     ` Jan Beulich
2025-02-04 10:10       ` H.J. Lu
2025-02-04 10:17         ` Jan Beulich
2025-02-03 11:42 ` [PATCH 5/5] ix86: restrict use of GOT32X relocs Jan Beulich
2025-02-03 22:41   ` H.J. Lu
2025-02-04  7:24     ` Jan Beulich
2025-02-04  7:27       ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).