From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-395782-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 126264 invoked by alias); 22 Apr 2015 15:08:30 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 126250 invoked by uid 89); 22 Apr 2015 15:08:29 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.1 required=5.0 tests=AWL,BAYES_50,KAM_ASCII_DIVIDERS,SPF_PASS autolearn=no version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 22 Apr 2015 15:08:23 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by uk-mta-12.uk.mimecast.lan; Wed, 22 Apr 2015 16:08:11 +0100
Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);	 Wed, 22 Apr 2015 16:08:11 +0100
Message-ID: <5537B95A.9050606@arm.com>
Date: Wed, 22 Apr 2015 15:08:00 -0000
From: Kyrill Tkachov <kyrylo.tkachov@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: Marcus Shawcroft <Marcus.Shawcroft@arm.com>,  Richard Earnshaw <Richard.Earnshaw@arm.com>, James Greenhalgh <James.Greenhalgh@arm.com>,  Evandro Menezes <e.menezes@samsung.com>, Andrew Pinski <apinski@cavium.com>
Subject: Re: [PATCH][AArch64] Implement -m{cpu,tune,arch}=native using only /proc/cpuinfo
References: <55351FA9.4020603@arm.com> <55378A03.5040509@arm.com>
In-Reply-To: <55378A03.5040509@arm.com>
X-MC-Unique: L0rp1e3WTjO2Bd2Y6pjZFg-1
Content-Type: multipart/mixed; boundary="------------010502060205000408090005"
X-IsSubscribed: yes
X-SW-Source: 2015-04/txt/msg01322.txt.bz2

This is a multi-part message in MIME format.
--------------010502060205000408090005
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-length: 30559


On 22/04/15 12:46, Kyrill Tkachov wrote:
> [Sorry for resending twice. My mail client glitched]
>
> On 20/04/15 16:47, Kyrill Tkachov wrote:
>> Hi all,
>>
>> This is an attempt to add native CPU detection to AArch64 GNU/Linux targ=
ets.
>> Similar to other ports we use SPEC rewriting to rewrite -m{cpu,tune,arch=
}=3Dnative
>> options into the appropriate CPU/architecture and the architecture exten=
sion options
>> when appropriate (i.e. +crypto/+crc etc).
>>
>> For CPU/architecture detection it gets a bit involved, especially when r=
unning on a
>> big.LITTLE system. My proposed approach is to look at /proc/cpuinfo/ and=
 search for the
>> implementer id and part number fields that uniquely identify each core (=
appropriate identifying
>> information is added to aarch64-cores.def). If we find two types of core=
 we have a big.LITTLE
>> system, so search through the core definitions extracted from aarch64-co=
res.def to find if we
>> support such a combination (currently only cortex-a57.cortex-a53 and cor=
tex-a72.cortex-a53)
>> and make sure that the implementer id field matches up.
>>
>> I tested this on a 4xCortex-A53 + 2xCortex-A57 big.LITTLE Ubuntu GNU/Lin=
ux system.
>> There are two formats for /proc/cpuinfo/ that I'm aware of. The first (o=
ld) one has the format:
>> --------------------------------------
>> processor    : 0
>> processor    : 1
>> processor    : 2
>> processor    : 3
>> processor    : 4
>> processor    : 5
>> Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer    : 0x41
>> CPU architecture: AArch64
>> CPU variant    : 0x0
>> CPU part    : 0xd03
>> --------------------------------------
>>
>> In this format it lists the 6 cores but the CPU part it reports is only =
the one for the core
>> from which /proc/cpuinfo was read from (!), in this case one of the Cort=
ex-A53 cores.
>> This means we detect a different CPU depending on which
>> core GCC was invoked on. Not ideal really, but there's no more informati=
on that we can extract.
>> Given the /proc/cpuinfo above, this patch will rewrite -mcpu=3Dnative in=
to -mcpu=3Dcortex-a53+fp+simd+crypto+crc
>>
>> The newer /proc/cpuinfo format proposed at
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?=
id=3D44b82b7700d05a52cd983799d3ecde1a976b3bed
>> looks like this:
>>
>> --------------------------------------------------------------
>> processor       : 0
>> Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer : 0x41
>> CPU architecture: 8
>> CPU variant     : 0x0
>> CPU part        : 0xd03
>> CPU revision    : 0
>>
>> processor       : 1
>> Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer : 0x41
>> CPU architecture: 8
>> CPU variant     : 0x0
>> CPU part        : 0xd03
>> CPU revision    : 0
>>
>> processor       : 2
>> Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer : 0x41
>> CPU architecture: 8
>> CPU variant     : 0x0
>> CPU part        : 0xd03
>> CPU revision    : 0
>>
>> processor       : 3
>> Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer : 0x41
>> CPU architecture: 8
>> CPU variant     : 0x0
>> CPU part        : 0xd03
>> CPU revision    : 0
>>
>> processor       : 4
>> Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer : 0x41
>> CPU architecture: 8
>> CPU variant     : 0x0
>> CPU part        : 0xd07
>> CPU revision    : 0
>>
>> processor       : 5
>> Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
>> CPU implementer : 0x41
>> CPU architecture: 8
>> CPU variant     : 0x0
>> CPU part        : 0xd07
>> CPU revision    : 0
>> --------------------------------------------------------------
>>
>> The Features field is used to detect the architectural features that we =
map to GCC option extensions
>> i.e. +fp,+crypto,+simd,+crc etc.
>>
>> Similarly, -march=3Dnative would be rewritten into -march=3Darmv8-a+fp+s=
imd+crypto+crc
>> while -mtune=3Dnative into -march=3Dcortex-a57.cortex-a53 (the arch exte=
nsion options are not valid
>> for -mtune).
>>
>> If it detects more than one implementer ID or the implementer IDs not ma=
tching up somewhere
>> or some other weirdness /proc/cpuinfo or fails to recognise the CPU it w=
ill bail out and ignore
>> the option entirely (similarly to other ports).
>>
>> The patch works fine with both /proc/cpuinfo formats although, as mentio=
ned above, it will not be
>> able to detect the big.LITTLE combination from the first format.
>>
>> I've filled in the implementer ID and part numbers for the Cortex-A57, C=
ortex-A53, Cortex-A72, X-Gene 1 cores,
>> but I don't have that info for thunderx or exynosm1. Could someone from =
Cavium and Samsung help me out
>> here? At present this patch has some false dummy values that I'd like to=
 fill out before committing this.
> Thanks Andrew and Evandro for the info.
> I've added the numbers to the patch, so it should work on those systems.
> I'm attaching the final patch here for review.

And resending here with a minor whitespace change in aarch64-cores.def to
make thunderx line up with the other entries. Thanks Evandro for pointing i=
t out!

Kyrill

>
> Thanks,
> Kyrill
>
>
> 2014-04-22  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
>        * config.host (case ${host}): Add aarch64*-*-linux case.
>        * config/aarch64/aarch64-cores.def: Add IMPLEMENTER_ID and PART_NU=
MBER
>        fields to all the cores.
>        * config/aarch64/aarch64-elf.h (DRIVER_SELF_SPECS): Add
>        MCPU_MTUNE_NATIVE_SPECS.
>        * config/aarch64/aarch64-option-extensions.def: Add FEAT_STRING fi=
eld to all
>        extensions.
>        * config/aarch64/aarch64-opts.h: Adjust definition of AARCH64_CORE.
>        * config/aarch64/aarch64.c: Adjust definition of AARCH64_CORE.
>        Adjust definition of AARCH64_OPT_EXTENSION.
>        * config/aarch64/aarch64.h: Adjust definition of AARCH64_CORE.
>        (MCPU_MTUNE_NATIVE_SPECS): Define.
>        * config/aarch64/driver-aarch64.c: New file.
>        * config/aarch64/x-arch64: New file.
>        * doc/invoke.texi (AArch64 Options): Document native value for -mc=
pu,
>        -mtune and -march.
>
>> I've bootstrapped this on the system mentioned above with -mcpu=3Dnative=
 in the BOOT_CFLAGS and regtested as well.
>> For the bootstrap I've used the 2nd /proc/cpuinfo format.
>>
>> I've also tested it on AArch64 hardware from ARM Ltd. and the ecosystem.
>>
>> If using the first format the bootstrap fails the comparison because, de=
pending on the OS scheduling, some files
>> are compiled with Cortex-A57 tuning and some with Cortex-A53 tuning and =
this is practically non-deterministic
>> across stage2 and stage3!
>>
>> What do people think of this approach?
>>
>> 2014-04-20  Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>
>>        * config.host (case ${host}): Add aarch64*-*-linux case.
>>        * config/aarch64/aarch64-cores.def: Add IMPLEMENTER_ID and PART_N=
UMBER
>>        fields to all the cores.
>>        * config/aarch64/aarch64-elf.h (DRIVER_SELF_SPECS): Add
>>        MCPU_MTUNE_NATIVE_SPECS.
>>        * config/aarch64/aarch64-option-extensions.def: Add FEAT_STRING f=
ield to all
>>        extensions.
>>        * config/aarch64/aarch64-opts.h: Adjust definition of AARCH64_COR=
E.
>>        * config/aarch64/aarch64.c: Adjust definition of AARCH64_CORE.
>>        Adjust definition of AARCH64_OPT_EXTENSION.
>>        * config/aarch64/aarch64.h: Adjust definition of AARCH64_CORE.
>>        (MCPU_MTUNE_NATIVE_SPECS): Define.
>>        * config/aarch64/driver-aarch64.c: New file.
>>        * config/aarch64/x-arch64: New file.
>>        * doc/invoke.texi (AArch64 Options): Document native value for -m=
cpu,
>>        -mtune and -march.
>
> aarch64-native.patch
>
>
> commit bfdf31b9d71620afac43b15ebf31502022a9bc63
> Author: Kyrylo Tkachov<kyrylo.tkachov@arm.com>
> Date:   Fri Apr 10 16:39:27 2015 +0100
>
>       [AArch64] Implement -m{tune,cpu,arch}=3Dnative on AArch64 GNU/Linux
>
> diff --git a/gcc/config.host b/gcc/config.host
> index b0f5940..a8896d1 100644
> --- a/gcc/config.host
> +++ b/gcc/config.host
> @@ -99,6 +99,14 @@ case ${host} in
>    esac
>
>    case ${host} in
> +  aarch64*-*-linux*)
> +    case ${target} in
> +      aarch64*-*-*)
> +       host_extra_gcc_objs=3D"driver-aarch64.o"
> +       host_xmake_file=3D"${host_xmake_file} aarch64/x-aarch64"
> +       ;;
> +    esac
> +    ;;
>      arm*-*-freebsd* | arm*-*-linux*)
>        case ${target} in
>          arm*-*-*)
> diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aa=
rch64-cores.def
> index e46d91b..7e6cb73 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -21,7 +21,7 @@
>
>       Before using #include to read this file, define a macro:
>
> -      AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH, FLAGS, =
COSTS)
> +      AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH, FLAGS, =
COSTS, IMP, PART)
>
>       The CORE_NAME is the name of the core, represented as a string cons=
tant.
>       The CORE_IDENT is the name of the core, represented as an identifie=
r.
> @@ -30,18 +30,23 @@
>       ARCH is the architecture revision implemented by the chip.
>       FLAGS are the bitwise-or of the traits that apply to that core.
>       This need not include flags implied by the architecture.
> -   COSTS is the name of the rtx_costs routine to use.  */
> +   COSTS is the name of the rtx_costs routine to use.
> +   IMP is the implementer ID of the CPU vendor.  On a GNU/Linux system i=
t can
> +   be found in /proc/cpuinfo.
> +   PART is the part number of the CPU.  On a GNU/Linux system it can be =
found
> +   in /proc/cpuinfo.  For big.LITTLE systems this should have the form a=
t of
> +   "<big core part number>.<LITTLE core part number>".  */
>
>    /* V8 Architecture Processors.  */
>
> -AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8,  AARCH64_FL_FOR_ARC=
H8 | AARCH64_FL_CRC, cortexa53)
> -AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8,  AARCH64_FL_FOR_ARC=
H8 | AARCH64_FL_CRC, cortexa57)
> -AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8,  AARCH64_FL_FOR_ARC=
H8 | AARCH64_FL_CRC, cortexa57)
> -AARCH64_CORE("exynos-m1",   exynosm1,  cortexa57, 8,  AARCH64_FL_FOR_ARC=
H8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57)
> -AARCH64_CORE("thunderx",    thunderx,  thunderx, 8,  AARCH64_FL_FOR_ARCH=
8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx)
> -AARCH64_CORE("xgene1",      xgene1,    xgene1,    8,  AARCH64_FL_FOR_ARC=
H8, xgene1)
> +AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8,  AARCH64_FL_FOR_ARC=
H8 | AARCH64_FL_CRC, cortexa53, "0x41", "0xd03")
> +AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8,  AARCH64_FL_FOR_ARC=
H8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07")
> +AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8,  AARCH64_FL_FOR_ARC=
H8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd08")
> +AARCH64_CORE("exynos-m1",   exynosm1,  cortexa57, 8,  AARCH64_FL_FOR_ARC=
H8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x53", "0x001")
> +AARCH64_CORE("thunderx",    thunderx,  thunderx, 8,  AARCH64_FL_FOR_ARCH=
8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, "0x43", "0x0a1")
> +AARCH64_CORE("xgene1",      xgene1,    xgene1,    8,  AARCH64_FL_FOR_ARC=
H8, xgene1, "0x50", "0x000")
>
>    /* V8 big.LITTLE implementations.  */
>
> -AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8,=
  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57)
> -AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8,=
  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57)
> +AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8,=
  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07.0xd03")
> +AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8,=
  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd08.0xd03")
> diff --git a/gcc/config/aarch64/aarch64-elf.h b/gcc/config/aarch64/aarch6=
4-elf.h
> index a5ec8cb..1ce6343 100644
> --- a/gcc/config/aarch64/aarch64-elf.h
> +++ b/gcc/config/aarch64/aarch64-elf.h
> @@ -132,7 +132,8 @@
>    #undef DRIVER_SELF_SPECS
>    #define DRIVER_SELF_SPECS \
>      " %{!mbig-endian:%{!mlittle-endian:" ENDIAN_SPEC "}}" \
> -  " %{!mabi=3D*:" ABI_SPEC "}"
> +  " %{!mabi=3D*:" ABI_SPEC "}" \
> +  MCPU_MTUNE_NATIVE_SPECS
>
>    #ifdef HAVE_AS_MABI_OPTION
>    #define ASM_MABI_SPEC "%{mabi=3D*:-mabi=3D%*}"
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/confi=
g/aarch64/aarch64-option-extensions.def
> index 6ec3ed6..f296296 100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -21,18 +21,25 @@
>
>       Before using #include to read this file, define a macro:
>
> -      AARCH64_OPT_EXTENSION(EXT_NAME, FLAGS_ON, FLAGS_OFF)
> +      AARCH64_OPT_EXTENSION(EXT_NAME, FLAGS_ON, FLAGS_OFF, FEATURE_STRIN=
G)
>
>       EXT_NAME is the name of the extension, represented as a string cons=
tant.
>       FLAGS_ON are the bitwise-or of the features that the extension adds.
> -   FLAGS_OFF are the bitwise-or of the features that the extension remov=
es.  */
> +   FLAGS_OFF are the bitwise-or of the features that the extension remov=
es.
> +   FEAT_STRING is a string containing the entries in the 'Features' fiel=
d of
> +   /proc/cpuinfo on a GNU/Linux system that correspond to this architect=
ure
> +   extension being available.  Sometimes multiple entries are needed to =
enable
> +   the extension (for example, the 'crypto' extension depends on four
> +   entries: aes, pmull, sha1, sha2 being present).  In that case this fi=
eld
> +   should contain a whitespace-separated list of the strings in 'Feature=
s'
> +   that are required.  Their order is not important.  */
>
>    /* V8 Architecture Extensions.
>       This list currently contains example extensions for CPUs that imple=
ment
>       AArch64, and therefore serves as a template for adding more CPUs in=
 the
>       future.  */
>
> -AARCH64_OPT_EXTENSION("fp",    AARCH64_FL_FP,  AARCH64_FL_FPSIMD | AARCH=
64_FL_CRYPTO)
> -AARCH64_OPT_EXTENSION("simd",  AARCH64_FL_FPSIMD,      AARCH64_FL_SIMD |=
 AARCH64_FL_CRYPTO)
> -AARCH64_OPT_EXTENSION("crypto",        AARCH64_FL_CRYPTO | AARCH64_FL_FP=
SIMD,  AARCH64_FL_CRYPTO)
> -AARCH64_OPT_EXTENSION("crc",   AARCH64_FL_CRC, AARCH64_FL_CRC)
> +AARCH64_OPT_EXTENSION("fp",    AARCH64_FL_FP,                          A=
ARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO, "fp")
> +AARCH64_OPT_EXTENSION("simd",  AARCH64_FL_FPSIMD,                      A=
ARCH64_FL_SIMD | AARCH64_FL_CRYPTO,   "asimd")
> +AARCH64_OPT_EXTENSION("crypto",        AARCH64_FL_CRYPTO | AARCH64_FL_FP=
SIMD,  AARCH64_FL_CRYPTO,                     "aes pmull sha1 sha2")
> +AARCH64_OPT_EXTENSION("crc",   AARCH64_FL_CRC,                         A=
ARCH64_FL_CRC,                        "crc32")
> diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch=
64-opts.h
> index f88ae5b..ea64cf4 100644
> --- a/gcc/config/aarch64/aarch64-opts.h
> +++ b/gcc/config/aarch64/aarch64-opts.h
> @@ -25,7 +25,7 @@
>    /* The various cores that implement AArch64.  */
>    enum aarch64_processor
>    {
> -#define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS) \
> +#define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IM=
P, PART) \
>      INTERNAL_IDENT,
>    #include "aarch64-cores.def"
>    #undef AARCH64_CORE
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 954e110..ea6020f 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -441,7 +441,7 @@ struct processor
>    /* Processor cores implementing AArch64.  */
>    static const struct processor all_cores[] =3D
>    {
> -#define AARCH64_CORE(NAME, IDENT, SCHED, ARCH, FLAGS, COSTS) \
> +#define AARCH64_CORE(NAME, IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, PART) \
>      {NAME, SCHED, #ARCH, ARCH, FLAGS, &COSTS##_tunings},
>    #include "aarch64-cores.def"
>    #undef AARCH64_CORE
> @@ -478,7 +478,7 @@ struct aarch64_option_extension
>    /* ISA extensions in AArch64.  */
>    static const struct aarch64_option_extension all_extensions[] =3D
>    {
> -#define AARCH64_OPT_EXTENSION(NAME, FLAGS_ON, FLAGS_OFF) \
> +#define AARCH64_OPT_EXTENSION(NAME, FLAGS_ON, FLAGS_OFF, FEATURE_STRING)=
 \
>      {NAME, FLAGS_ON, FLAGS_OFF},
>    #include "aarch64-option-extensions.def"
>    #undef AARCH64_OPT_EXTENSION
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index bf59e40..1f7187b 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -506,7 +506,7 @@ enum reg_class
>
>    enum target_cpus
>    {
> -#define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS) \
> +#define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IM=
P, PART) \
>      TARGET_CPU_##INTERNAL_IDENT,
>    #include "aarch64-cores.def"
>    #undef AARCH64_CORE
> @@ -929,11 +929,24 @@ extern const char *aarch64_rewrite_mcpu (int argc, =
const char **argv);
>    #define BIG_LITTLE_CPU_SPEC_FUNCTIONS \
>      { "rewrite_mcpu", aarch64_rewrite_mcpu },
>
> +#if defined(__aarch64__)
> +extern const char *host_detect_local_cpu (int argc, const char **argv);
> +# define EXTRA_SPEC_FUNCTIONS                                          \
> +  { "local_cpu_detect", host_detect_local_cpu },                       \
> +  BIG_LITTLE_CPU_SPEC_FUNCTIONS
> +
> +# define MCPU_MTUNE_NATIVE_SPECS                                       \
> +   " %{march=3Dnative:%<march=3Dnative %:local_cpu_detect(arch)}"       =
   \
> +   " %{mcpu=3Dnative:%<mcpu=3Dnative %:local_cpu_detect(cpu)}"          =
   \
> +   " %{mtune=3Dnative:%<mtune=3Dnative %:local_cpu_detect(tune)}"
> +#else
> +# define MCPU_MTUNE_NATIVE_SPECS ""
> +# define EXTRA_SPEC_FUNCTIONS BIG_LITTLE_CPU_SPEC_FUNCTIONS
> +#endif
> +
>    #define ASM_CPU_SPEC \
>       BIG_LITTLE_SPEC
>
> -#define EXTRA_SPEC_FUNCTIONS BIG_LITTLE_CPU_SPEC_FUNCTIONS
> -
>    #define EXTRA_SPECS                                           \
>      { "asm_cpu_spec",           ASM_CPU_SPEC }
>
> diff --git a/gcc/config/aarch64/driver-aarch64.c b/gcc/config/aarch64/dri=
ver-aarch64.c
> new file mode 100644
> index 0000000..da10a4c
> --- /dev/null
> +++ b/gcc/config/aarch64/driver-aarch64.c
> @@ -0,0 +1,307 @@
> +/* Native CPU detection for aarch64.
> +   Copyright (C) 2014 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +
> +struct arch_extension
> +{
> +  const char *ext;
> +  const char *feat_string;
> +};
> +
> +#define AARCH64_OPT_EXTENSION(EXT_NAME, FLAGS_ON, FLAGS_OFF, FEATURE_STR=
ING) \
> +  { EXT_NAME, FEATURE_STRING },
> +static struct arch_extension ext_to_feat_string[] =3D
> +{
> +#include "aarch64-option-extensions.def"
> +};
> +#undef AARCH64_OPT_EXTENSION
> +
> +
> +struct aarch64_core_data
> +{
> +  const char* name;
> +  const char* arch;
> +  const char* implementer_id;
> +  const char* part_no;
> +};
> +
> +#define AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHED, ARCH, FLAGS, COSTS, I=
MP, PART) \
> +  { CORE_NAME, #ARCH, IMP, PART },
> +
> +static struct aarch64_core_data cpu_data [] =3D
> +{
> +#include "aarch64-cores.def"
> +  { NULL, NULL, NULL, NULL }
> +};
> +
> +#undef AARCH64_CORE
> +
> +struct aarch64_arch
> +{
> +  const char* id;
> +  const char* name;
> +};
> +
> +#define AARCH64_ARCH(NAME, CORE, ARCH, FLAGS) \
> +  { #ARCH, NAME  },
> +
> +static struct aarch64_arch aarch64_arches [] =3D
> +{
> +#include "aarch64-arches.def"
> +  {NULL, NULL}
> +};
> +
> +#undef AARCH64_ARCH
> +
> +/* Return the full architecture name string corresponding to the
> +   identifier ID.  */
> +
> +static const char*
> +get_arch_name_from_id (const char* id)
> +{
> +  unsigned int i =3D 0;
> +
> +  for (i =3D 0; aarch64_arches[i].id !=3D NULL; i++)
> +    {
> +      if (strcmp (id, aarch64_arches[i].id) =3D=3D 0)
> +        return aarch64_arches[i].name;
> +    }
> +
> +  return NULL;
> +}
> +
> +
> +/* Check wether the string CORE contains the same CPU part numbers
> +   as BL_STRING.  For example CORE=3D"{0xd03, 0xd07}" and BL_STRING=3D"0=
xd07.0xd03"
> +   should return true.  */
> +
> +static bool
> +valid_bL_string_p (const char** core, const char* bL_string)
> +{
> +  return strstr (bL_string, core[0]) !=3D NULL
> +         && strstr (bL_string, core[1]) !=3D NULL;
> +}
> +
> +/*  Return true iff ARR contains STR in one of its two elements.  */
> +
> +static bool
> +contains_string_p (const char** arr, const char* str)
> +{
> +  bool res =3D false;
> +
> +  if (arr[0] !=3D NULL)
> +    {
> +      res =3D strstr (arr[0], str) !=3D NULL;
> +      if (res)
> +        return res;
> +
> +      if (arr[1] !=3D NULL)
> +        return strstr (arr[1], str) !=3D NULL;
> +    }
> +
> +  return false;
> +}
> +
> +/* This will be called by the spec parser in gcc.c when it sees
> +   a %:local_cpu_detect(args) construct.  Currently it will be called
> +   with either "arch", "cpu" or "tune" as argument depending on if
> +   -march=3Dnative, -mcpu=3Dnative or -mtune=3Dnative is to be substitut=
ed.
> +
> +   It returns a string containing new command line parameters to be
> +   put at the place of the above two options, depending on what CPU
> +   this is executed.  E.g. "-march=3Darmv8-a" on a Cortex-A57 for
> +   -march=3Dnative.  If the routine can't detect a known processor,
> +   the -march or -mtune option is discarded.
> +
> +   For -mtune and -mcpu arguments it attempts to detect the CPU or
> +   a big.LITTLE system.
> +   ARGC and ARGV are set depending on the actual arguments given
> +   in the spec.  */
> +
> +const char *
> +host_detect_local_cpu (int argc, const char **argv)
> +{
> +  const char *arch_id =3D NULL;
> +  const char *res =3D NULL;
> +  static const int num_exts =3D ARRAY_SIZE (ext_to_feat_string);
> +  char buf[128];
> +  FILE *f =3D NULL;
> +  bool arch =3D false;
> +  bool tune =3D false;
> +  bool cpu =3D false;
> +  unsigned int i =3D 0;
> +  unsigned int core_idx =3D 0;
> +  const char* imps[2] =3D { NULL, NULL };
> +  const char* cores[2] =3D { NULL, NULL };
> +  unsigned int n_cores =3D 0;
> +  unsigned int n_imps =3D 0;
> +  bool processed_exts =3D false;
> +  const char *ext_string =3D "";
> +
> +  gcc_assert (argc);
> +
> +  if (!argv[0])
> +    goto not_found;
> +
> +  /* Are we processing -march, mtune or mcpu?  */
> +  arch =3D strcmp (argv[0], "arch") =3D=3D 0;
> +  if (!arch)
> +    tune =3D strcmp (argv[0], "tune") =3D=3D 0;
> +
> +  if (!arch && !tune)
> +    cpu =3D strcmp (argv[0], "cpu") =3D=3D 0;
> +
> +  if (!arch && !tune && !cpu)
> +    goto not_found;
> +
> +  f =3D fopen ("/proc/cpuinfo", "r");
> +
> +  if (f =3D=3D NULL)
> +    goto not_found;
> +
> +  /* Look through /proc/cpuinfo to determine the implementer
> +     and then the part number that identifies a particular core.  */
> +  while (fgets (buf, sizeof (buf), f) !=3D NULL)
> +    {
> +      if (strstr (buf, "implementer") !=3D NULL)
> +       {
> +         for (i =3D 0; cpu_data[i].name !=3D NULL; i++)
> +           if (strstr (buf, cpu_data[i].implementer_id) !=3D NULL
> +                && !contains_string_p (imps, cpu_data[i].implementer_id))
> +             {
> +                if (n_imps =3D=3D 2)
> +                  goto not_found;
> +
> +                imps[n_imps++] =3D cpu_data[i].implementer_id;
> +
> +                break;
> +             }
> +          continue;
> +       }
> +
> +      if (strstr (buf, "part") !=3D NULL)
> +       {
> +         for (i =3D 0; cpu_data[i].name !=3D NULL; i++)
> +           if (strstr (buf, cpu_data[i].part_no) !=3D NULL
> +                && !contains_string_p (cores, cpu_data[i].part_no))
> +             {
> +                if (n_cores =3D=3D 2)
> +                  goto not_found;
> +
> +                cores[n_cores++] =3D cpu_data[i].part_no;
> +               core_idx =3D i;
> +               arch_id =3D cpu_data[i].arch;
> +               break;
> +             }
> +          continue;
> +        }
> +      if (!tune && !processed_exts && strstr (buf, "Features") !=3D NULL)
> +        {
> +          for (i =3D 0; i < num_exts; i++)
> +            {
> +              bool enabled =3D true;
> +              char *p =3D NULL;
> +              char *feat_string =3D concat (ext_to_feat_string[i].feat_s=
tring, NULL);
> +
> +              p =3D strtok (feat_string, " ");
> +
> +              while (p !=3D NULL)
> +                {
> +                  if (strstr (buf, p) =3D=3D NULL)
> +                    {
> +                      enabled =3D false;
> +                      break;
> +                    }
> +                  p =3D strtok (NULL, " ");
> +                }
> +              ext_string =3D concat (ext_string, "+", enabled ? "" : "no=
",
> +                                   ext_to_feat_string[i].ext, NULL);
> +            }
> +          processed_exts =3D true;
> +        }
> +    }
> +
> +  fclose (f);
> +  f =3D NULL;
> +
> +  /* Weird cpuinfo format that we don't know how to handle.  */
> +  if (n_cores =3D=3D 0 || n_cores > 2 || n_imps !=3D 1)
> +    goto not_found;
> +
> +  if (arch && !arch_id)
> +    goto not_found;
> +
> +  if (arch)
> +    {
> +      const char* arch_name =3D get_arch_name_from_id (arch_id);
> +
> +      /* We got some arch indentifier that's not in aarch64-arches.def? =
 */
> +      if (!arch_name)
> +        goto not_found;
> +
> +      res =3D concat ("-march=3D", arch_name, NULL);
> +    }
> +  /* We have big.LITTLE.  */
> +  else if (n_cores =3D=3D 2)
> +    {
> +      for (i =3D 0; cpu_data[i].name !=3D NULL; i++)
> +        {
> +          if (strchr (cpu_data[i].part_no, '.') !=3D NULL
> +              && strncmp (cpu_data[i].implementer_id, imps[0], strlen (i=
mps[0]) - 1) =3D=3D 0
> +              && valid_bL_string_p (cores, cpu_data[i].part_no))
> +            {
> +              res =3D concat ("-m", cpu ? "cpu" : "tune", "=3D", cpu_dat=
a[i].name, NULL);
> +              break;
> +            }
> +        }
> +      if (!res)
> +        goto not_found;
> +    }
> +  /* The simple, non-big.LITTLE case.  */
> +  else
> +    {
> +      if (strncmp (cpu_data[core_idx].implementer_id, imps[0],
> +                   strlen (imps[0]) - 1) !=3D 0)
> +        goto not_found;
> +
> +      res =3D concat ("-m", cpu ? "cpu" : "tune", "=3D",
> +                      cpu_data[core_idx].name, NULL);
> +    }
> +
> +  if (tune)
> +    return res;
> +
> +  res =3D concat (res, ext_string, NULL);
> +
> +  return res;
> +
> +not_found:
> +  {
> +   /* If detection fails we ignore the option.
> +      Clean up and return empty string.  */
> +
> +    if (f)
> +      fclose (f);
> +
> +    return "";
> +  }
> +}
> +
> diff --git a/gcc/config/aarch64/x-aarch64 b/gcc/config/aarch64/x-aarch64
> new file mode 100644
> index 0000000..8c09e04
> --- /dev/null
> +++ b/gcc/config/aarch64/x-aarch64
> @@ -0,0 +1,3 @@
> +driver-aarch64.o: $(srcdir)/config/aarch64/driver-aarch64.c \
> +  $(CONFIG_H) $(SYSTEM_H)
> +       $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index e2918cb..5787524 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -12318,8 +12318,12 @@ This involves inserting a NOP instruction betwee=
n memory instructions and
>    Specify the name of the target architecture, optionally suffixed by on=
e or
>    more feature modifiers.  This option has the form
>    @option{-march=3D@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, w=
here the
> -only permissible value for @var{arch} is @samp{armv8-a}.  The permissible
> -values for @var{feature} are documented in the sub-section below.
> +only permissible value for @var{arch} is @samp{armv8-a}.
> +The permissible values for @var{feature} are documented in the sub-secti=
on
> +below.  Additionally on native AArch64 GNU/Linux systems the value
> +@samp{native} is available.  This option causes the compiler to pick the
> +architecture of the host system.  If the compiler is unable to recognize=
 the
> +architecture of the host system this option has no effect.
>
>    Where conflicting feature modifiers are specified, the right-most feat=
ure is
>    used.
> @@ -12343,6 +12347,13 @@ Additionally, this option can specify that GCC s=
hould tune the performance
>    of the code for a big.LITTLE system.  Permissible values for this
>    option are: @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53}.
>
> +Additionally on native AArch64 GNU/Linux systems the value @samp{native}
> +is available.
> +This option causes the compiler to pick the architecture of and tune the
> +performance of the code for the processor of the host system.
> +If the compiler is unable to recognize the processor of the host system
> +this option has no effect.
> +
>    Where none of @option{-mtune=3D}, @option{-mcpu=3D} or @option{-march=
=3D}
>    are specified, the code is tuned to perform well across a range
>    of target processors.
> @@ -12355,7 +12366,11 @@ Specify the name of the target processor, option=
ally suffixed by one or more
>    feature modifiers.  This option has the form
>    @option{-mcpu=3D@var{cpu}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, whe=
re the
>    permissible values for @var{cpu} are the same as those available for
> -@option{-mtune}.
> +@option{-mtune}.  Additionally on native AArch64 GNU/Linux systems the
> +value @samp{native} is available.
> +This option causes the compiler to tune the performance of the code for =
the
> +processor of the host system.  If the compiler is unable to recognize the
> +processor of the host system this option has no effect.
>
>    The permissible values for @var{feature} are documented in the sub-sec=
tion
>    below.
>


--------------010502060205000408090005
Content-Type: text/x-patch; name=aarch64-native.patch
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
 filename="aarch64-native.patch"
Content-length: 21248

commit 5b8c2958530a96facd16630b89023a4c102af85d
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Fri Apr 10 16:39:27 2015 +0100

    [AArch64] Implement -m{tune,cpu,arch}=3Dnative on AArch64 GNU/Linux

diff --git a/gcc/config.host b/gcc/config.host
index b0f5940..a8896d1 100644
--- a/gcc/config.host
+++ b/gcc/config.host
@@ -99,6 +99,14 @@ case ${host} in
 esac
=20
 case ${host} in
+  aarch64*-*-linux*)
+    case ${target} in
+      aarch64*-*-*)
+	host_extra_gcc_objs=3D"driver-aarch64.o"
+	host_xmake_file=3D"${host_xmake_file} aarch64/x-aarch64"
+	;;
+    esac
+    ;;
   arm*-*-freebsd* | arm*-*-linux*)
     case ${target} in
       arm*-*-*)
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarc=
h64-cores.def
index e46d91b..7c285ba 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -21,7 +21,7 @@
=20
    Before using #include to read this file, define a macro:
=20
-      AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH, FLAGS, CO=
STS)
+      AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH, FLAGS, CO=
STS, IMP, PART)
=20
    The CORE_NAME is the name of the core, represented as a string constant.
    The CORE_IDENT is the name of the core, represented as an identifier.
@@ -30,18 +30,23 @@
    ARCH is the architecture revision implemented by the chip.
    FLAGS are the bitwise-or of the traits that apply to that core.
    This need not include flags implied by the architecture.
-   COSTS is the name of the rtx_costs routine to use.  */
+   COSTS is the name of the rtx_costs routine to use.
+   IMP is the implementer ID of the CPU vendor.  On a GNU/Linux system it =
can
+   be found in /proc/cpuinfo.
+   PART is the part number of the CPU.  On a GNU/Linux system it can be fo=
und
+   in /proc/cpuinfo.  For big.LITTLE systems this should have the form at =
of
+   "<big core part number>.<LITTLE core part number>".  */
=20
 /* V8 Architecture Processors.  */
=20
-AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8,  AARCH64_FL_FOR_ARCH8=
 | AARCH64_FL_CRC, cortexa53)
-AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8,  AARCH64_FL_FOR_ARCH8=
 | AARCH64_FL_CRC, cortexa57)
-AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8,  AARCH64_FL_FOR_ARCH8=
 | AARCH64_FL_CRC, cortexa57)
-AARCH64_CORE("exynos-m1",   exynosm1,  cortexa57, 8,  AARCH64_FL_FOR_ARCH8=
 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57)
-AARCH64_CORE("thunderx",    thunderx,  thunderx, 8,  AARCH64_FL_FOR_ARCH8 =
| AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx)
-AARCH64_CORE("xgene1",      xgene1,    xgene1,    8,  AARCH64_FL_FOR_ARCH8=
, xgene1)
+AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8,  AARCH64_FL_FOR_ARCH8=
 | AARCH64_FL_CRC, cortexa53, "0x41", "0xd03")
+AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8,  AARCH64_FL_FOR_ARCH8=
 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07")
+AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8,  AARCH64_FL_FOR_ARCH8=
 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd08")
+AARCH64_CORE("exynos-m1",   exynosm1,  cortexa57, 8,  AARCH64_FL_FOR_ARCH8=
 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x53", "0x001")
+AARCH64_CORE("thunderx",    thunderx,  thunderx,  8,  AARCH64_FL_FOR_ARCH8=
 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
+AARCH64_CORE("xgene1",      xgene1,    xgene1,    8,  AARCH64_FL_FOR_ARCH8=
, xgene1, "0x50", "0x000")
=20
 /* V8 big.LITTLE implementations.  */
=20
-AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8,  =
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57)
-AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8,  =
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57)
+AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8,  =
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07.0xd03")
+AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8,  =
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd08.0xd03")
diff --git a/gcc/config/aarch64/aarch64-elf.h b/gcc/config/aarch64/aarch64-=
elf.h
index a5ec8cb..1ce6343 100644
--- a/gcc/config/aarch64/aarch64-elf.h
+++ b/gcc/config/aarch64/aarch64-elf.h
@@ -132,7 +132,8 @@
 #undef DRIVER_SELF_SPECS
 #define DRIVER_SELF_SPECS \
   " %{!mbig-endian:%{!mlittle-endian:" ENDIAN_SPEC "}}" \
-  " %{!mabi=3D*:" ABI_SPEC "}"
+  " %{!mabi=3D*:" ABI_SPEC "}" \
+  MCPU_MTUNE_NATIVE_SPECS
=20
 #ifdef HAVE_AS_MABI_OPTION
 #define ASM_MABI_SPEC	"%{mabi=3D*:-mabi=3D%*}"
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/=
aarch64/aarch64-option-extensions.def
index 6ec3ed6..f296296 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -21,18 +21,25 @@
=20
    Before using #include to read this file, define a macro:
=20
-      AARCH64_OPT_EXTENSION(EXT_NAME, FLAGS_ON, FLAGS_OFF)
+      AARCH64_OPT_EXTENSION(EXT_NAME, FLAGS_ON, FLAGS_OFF, FEATURE_STRING)
=20
    EXT_NAME is the name of the extension, represented as a string constant.
    FLAGS_ON are the bitwise-or of the features that the extension adds.
-   FLAGS_OFF are the bitwise-or of the features that the extension removes=
.  */
+   FLAGS_OFF are the bitwise-or of the features that the extension removes.
+   FEAT_STRING is a string containing the entries in the 'Features' field =
of
+   /proc/cpuinfo on a GNU/Linux system that correspond to this architecture
+   extension being available.  Sometimes multiple entries are needed to en=
able
+   the extension (for example, the 'crypto' extension depends on four
+   entries: aes, pmull, sha1, sha2 being present).  In that case this field
+   should contain a whitespace-separated list of the strings in 'Features'
+   that are required.  Their order is not important.  */
=20
 /* V8 Architecture Extensions.
    This list currently contains example extensions for CPUs that implement
    AArch64, and therefore serves as a template for adding more CPUs in the
    future.  */
=20
-AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,	AARCH64_FL_FPSIMD | AARCH64_FL_=
CRYPTO)
-AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,	AARCH64_FL_SIMD | AARCH64=
_FL_CRYPTO)
-AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,	AAR=
CH64_FL_CRYPTO)
-AARCH64_OPT_EXTENSION("crc",	AARCH64_FL_CRC,	AARCH64_FL_CRC)
+AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,                          AARCH6=
4_FL_FPSIMD | AARCH64_FL_CRYPTO, "fp")
+AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,                      AARC=
H64_FL_SIMD | AARCH64_FL_CRYPTO,   "asimd")
+AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  AA=
RCH64_FL_CRYPTO,                     "aes pmull sha1 sha2")
+AARCH64_OPT_EXTENSION("crc",	AARCH64_FL_CRC,                         AARCH=
64_FL_CRC,                        "crc32")
diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64=
-opts.h
index f88ae5b..ea64cf4 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -25,7 +25,7 @@
 /* The various cores that implement AArch64.  */
 enum aarch64_processor
 {
-#define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS) \
+#define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP,=
 PART) \
   INTERNAL_IDENT,
 #include "aarch64-cores.def"
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a90993b..5999950 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -441,7 +441,7 @@ struct processor
 /* Processor cores implementing AArch64.  */
 static const struct processor all_cores[] =3D
 {
-#define AARCH64_CORE(NAME, IDENT, SCHED, ARCH, FLAGS, COSTS) \
+#define AARCH64_CORE(NAME, IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, PART) \
   {NAME, SCHED, #ARCH, ARCH, FLAGS, &COSTS##_tunings},
 #include "aarch64-cores.def"
 #undef AARCH64_CORE
@@ -478,7 +478,7 @@ struct aarch64_option_extension
 /* ISA extensions in AArch64.  */
 static const struct aarch64_option_extension all_extensions[] =3D
 {
-#define AARCH64_OPT_EXTENSION(NAME, FLAGS_ON, FLAGS_OFF) \
+#define AARCH64_OPT_EXTENSION(NAME, FLAGS_ON, FLAGS_OFF, FEATURE_STRING) \
   {NAME, FLAGS_ON, FLAGS_OFF},
 #include "aarch64-option-extensions.def"
 #undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index bf59e40..1f7187b 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -506,7 +506,7 @@ enum reg_class
=20
 enum target_cpus
 {
-#define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS) \
+#define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP,=
 PART) \
   TARGET_CPU_##INTERNAL_IDENT,
 #include "aarch64-cores.def"
 #undef AARCH64_CORE
@@ -929,11 +929,24 @@ extern const char *aarch64_rewrite_mcpu (int argc, co=
nst char **argv);
 #define BIG_LITTLE_CPU_SPEC_FUNCTIONS \
   { "rewrite_mcpu", aarch64_rewrite_mcpu },
=20
+#if defined(__aarch64__)
+extern const char *host_detect_local_cpu (int argc, const char **argv);
+# define EXTRA_SPEC_FUNCTIONS						\
+  { "local_cpu_detect", host_detect_local_cpu },			\
+  BIG_LITTLE_CPU_SPEC_FUNCTIONS
+
+# define MCPU_MTUNE_NATIVE_SPECS					\
+   " %{march=3Dnative:%<march=3Dnative %:local_cpu_detect(arch)}"		\
+   " %{mcpu=3Dnative:%<mcpu=3Dnative %:local_cpu_detect(cpu)}"		\
+   " %{mtune=3Dnative:%<mtune=3Dnative %:local_cpu_detect(tune)}"
+#else
+# define MCPU_MTUNE_NATIVE_SPECS ""
+# define EXTRA_SPEC_FUNCTIONS BIG_LITTLE_CPU_SPEC_FUNCTIONS
+#endif
+
 #define ASM_CPU_SPEC \
    BIG_LITTLE_SPEC
=20
-#define EXTRA_SPEC_FUNCTIONS BIG_LITTLE_CPU_SPEC_FUNCTIONS
-
 #define EXTRA_SPECS						\
   { "asm_cpu_spec",		ASM_CPU_SPEC }
=20
diff --git a/gcc/config/aarch64/driver-aarch64.c b/gcc/config/aarch64/drive=
r-aarch64.c
new file mode 100644
index 0000000..da10a4c
--- /dev/null
+++ b/gcc/config/aarch64/driver-aarch64.c
@@ -0,0 +1,307 @@
+/* Native CPU detection for aarch64.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+
+struct arch_extension
+{
+  const char *ext;
+  const char *feat_string;
+};
+
+#define AARCH64_OPT_EXTENSION(EXT_NAME, FLAGS_ON, FLAGS_OFF, FEATURE_STRIN=
G) \
+  { EXT_NAME, FEATURE_STRING },
+static struct arch_extension ext_to_feat_string[] =3D
+{
+#include "aarch64-option-extensions.def"
+};
+#undef AARCH64_OPT_EXTENSION
+
+
+struct aarch64_core_data
+{
+  const char* name;
+  const char* arch;
+  const char* implementer_id;
+  const char* part_no;
+};
+
+#define AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP=
, PART) \
+  { CORE_NAME, #ARCH, IMP, PART },
+
+static struct aarch64_core_data cpu_data [] =3D
+{
+#include "aarch64-cores.def"
+  { NULL, NULL, NULL, NULL }
+};
+
+#undef AARCH64_CORE
+
+struct aarch64_arch
+{
+  const char* id;
+  const char* name;
+};
+
+#define AARCH64_ARCH(NAME, CORE, ARCH, FLAGS) \
+  { #ARCH, NAME  },
+
+static struct aarch64_arch aarch64_arches [] =3D
+{
+#include "aarch64-arches.def"
+  {NULL, NULL}
+};
+
+#undef AARCH64_ARCH
+
+/* Return the full architecture name string corresponding to the
+   identifier ID.  */
+
+static const char*
+get_arch_name_from_id (const char* id)
+{
+  unsigned int i =3D 0;
+
+  for (i =3D 0; aarch64_arches[i].id !=3D NULL; i++)
+    {
+      if (strcmp (id, aarch64_arches[i].id) =3D=3D 0)
+        return aarch64_arches[i].name;
+    }
+
+  return NULL;
+}
+
+
+/* Check wether the string CORE contains the same CPU part numbers
+   as BL_STRING.  For example CORE=3D"{0xd03, 0xd07}" and BL_STRING=3D"0xd=
07.0xd03"
+   should return true.  */
+
+static bool
+valid_bL_string_p (const char** core, const char* bL_string)
+{
+  return strstr (bL_string, core[0]) !=3D NULL
+         && strstr (bL_string, core[1]) !=3D NULL;
+}
+
+/*  Return true iff ARR contains STR in one of its two elements.  */
+
+static bool
+contains_string_p (const char** arr, const char* str)
+{
+  bool res =3D false;
+
+  if (arr[0] !=3D NULL)
+    {
+      res =3D strstr (arr[0], str) !=3D NULL;
+      if (res)
+        return res;
+
+      if (arr[1] !=3D NULL)
+        return strstr (arr[1], str) !=3D NULL;
+    }
+
+  return false;
+}
+
+/* This will be called by the spec parser in gcc.c when it sees
+   a %:local_cpu_detect(args) construct.  Currently it will be called
+   with either "arch", "cpu" or "tune" as argument depending on if
+   -march=3Dnative, -mcpu=3Dnative or -mtune=3Dnative is to be substituted.
+
+   It returns a string containing new command line parameters to be
+   put at the place of the above two options, depending on what CPU
+   this is executed.  E.g. "-march=3Darmv8-a" on a Cortex-A57 for
+   -march=3Dnative.  If the routine can't detect a known processor,
+   the -march or -mtune option is discarded.
+
+   For -mtune and -mcpu arguments it attempts to detect the CPU or
+   a big.LITTLE system.
+   ARGC and ARGV are set depending on the actual arguments given
+   in the spec.  */
+
+const char *
+host_detect_local_cpu (int argc, const char **argv)
+{
+  const char *arch_id =3D NULL;
+  const char *res =3D NULL;
+  static const int num_exts =3D ARRAY_SIZE (ext_to_feat_string);
+  char buf[128];
+  FILE *f =3D NULL;
+  bool arch =3D false;
+  bool tune =3D false;
+  bool cpu =3D false;
+  unsigned int i =3D 0;
+  unsigned int core_idx =3D 0;
+  const char* imps[2] =3D { NULL, NULL };
+  const char* cores[2] =3D { NULL, NULL };
+  unsigned int n_cores =3D 0;
+  unsigned int n_imps =3D 0;
+  bool processed_exts =3D false;
+  const char *ext_string =3D "";
+
+  gcc_assert (argc);
+
+  if (!argv[0])
+    goto not_found;
+
+  /* Are we processing -march, mtune or mcpu?  */
+  arch =3D strcmp (argv[0], "arch") =3D=3D 0;
+  if (!arch)
+    tune =3D strcmp (argv[0], "tune") =3D=3D 0;
+
+  if (!arch && !tune)
+    cpu =3D strcmp (argv[0], "cpu") =3D=3D 0;
+
+  if (!arch && !tune && !cpu)
+    goto not_found;
+
+  f =3D fopen ("/proc/cpuinfo", "r");
+
+  if (f =3D=3D NULL)
+    goto not_found;
+
+  /* Look through /proc/cpuinfo to determine the implementer
+     and then the part number that identifies a particular core.  */
+  while (fgets (buf, sizeof (buf), f) !=3D NULL)
+    {
+      if (strstr (buf, "implementer") !=3D NULL)
+	{
+	  for (i =3D 0; cpu_data[i].name !=3D NULL; i++)
+	    if (strstr (buf, cpu_data[i].implementer_id) !=3D NULL
+                && !contains_string_p (imps, cpu_data[i].implementer_id))
+	      {
+                if (n_imps =3D=3D 2)
+                  goto not_found;
+
+                imps[n_imps++] =3D cpu_data[i].implementer_id;
+
+                break;
+	      }
+          continue;
+	}
+
+      if (strstr (buf, "part") !=3D NULL)
+	{
+	  for (i =3D 0; cpu_data[i].name !=3D NULL; i++)
+	    if (strstr (buf, cpu_data[i].part_no) !=3D NULL
+                && !contains_string_p (cores, cpu_data[i].part_no))
+	      {
+                if (n_cores =3D=3D 2)
+                  goto not_found;
+
+                cores[n_cores++] =3D cpu_data[i].part_no;
+	        core_idx =3D i;
+	        arch_id =3D cpu_data[i].arch;
+	        break;
+	      }
+          continue;
+        }
+      if (!tune && !processed_exts && strstr (buf, "Features") !=3D NULL)
+        {
+          for (i =3D 0; i < num_exts; i++)
+            {
+              bool enabled =3D true;
+              char *p =3D NULL;
+              char *feat_string =3D concat (ext_to_feat_string[i].feat_str=
ing, NULL);
+
+              p =3D strtok (feat_string, " ");
+
+              while (p !=3D NULL)
+                {
+                  if (strstr (buf, p) =3D=3D NULL)
+                    {
+                      enabled =3D false;
+                      break;
+                    }
+                  p =3D strtok (NULL, " ");
+                }
+              ext_string =3D concat (ext_string, "+", enabled ? "" : "no",
+                                   ext_to_feat_string[i].ext, NULL);
+            }
+          processed_exts =3D true;
+        }
+    }
+
+  fclose (f);
+  f =3D NULL;
+
+  /* Weird cpuinfo format that we don't know how to handle.  */
+  if (n_cores =3D=3D 0 || n_cores > 2 || n_imps !=3D 1)
+    goto not_found;
+
+  if (arch && !arch_id)
+    goto not_found;
+
+  if (arch)
+    {
+      const char* arch_name =3D get_arch_name_from_id (arch_id);
+
+      /* We got some arch indentifier that's not in aarch64-arches.def?  */
+      if (!arch_name)
+        goto not_found;
+
+      res =3D concat ("-march=3D", arch_name, NULL);
+    }
+  /* We have big.LITTLE.  */
+  else if (n_cores =3D=3D 2)
+    {
+      for (i =3D 0; cpu_data[i].name !=3D NULL; i++)
+        {
+          if (strchr (cpu_data[i].part_no, '.') !=3D NULL
+              && strncmp (cpu_data[i].implementer_id, imps[0], strlen (imp=
s[0]) - 1) =3D=3D 0
+              && valid_bL_string_p (cores, cpu_data[i].part_no))
+            {
+              res =3D concat ("-m", cpu ? "cpu" : "tune", "=3D", cpu_data[=
i].name, NULL);
+              break;
+            }
+        }
+      if (!res)
+        goto not_found;
+    }
+  /* The simple, non-big.LITTLE case.  */
+  else
+    {
+      if (strncmp (cpu_data[core_idx].implementer_id, imps[0],
+                   strlen (imps[0]) - 1) !=3D 0)
+        goto not_found;
+
+      res =3D concat ("-m", cpu ? "cpu" : "tune", "=3D",
+                      cpu_data[core_idx].name, NULL);
+    }
+
+  if (tune)
+    return res;
+
+  res =3D concat (res, ext_string, NULL);
+
+  return res;
+
+not_found:
+  {
+   /* If detection fails we ignore the option.
+      Clean up and return empty string.  */
+
+    if (f)
+      fclose (f);
+
+    return "";
+  }
+}
+
diff --git a/gcc/config/aarch64/x-aarch64 b/gcc/config/aarch64/x-aarch64
new file mode 100644
index 0000000..8c09e04
--- /dev/null
+++ b/gcc/config/aarch64/x-aarch64
@@ -0,0 +1,3 @@
+driver-aarch64.o: $(srcdir)/config/aarch64/driver-aarch64.c \
+  $(CONFIG_H) $(SYSTEM_H)
+	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e89e5a8..80dd131 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12323,8 +12323,12 @@ This involves inserting a NOP instruction between =
memory instructions and
 Specify the name of the target architecture, optionally suffixed by one or
 more feature modifiers.  This option has the form
 @option{-march=3D@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, where=
 the
-only permissible value for @var{arch} is @samp{armv8-a}.  The permissible
-values for @var{feature} are documented in the sub-section below.
+only permissible value for @var{arch} is @samp{armv8-a}.
+The permissible values for @var{feature} are documented in the sub-section
+below.  Additionally on native AArch64 GNU/Linux systems the value
+@samp{native} is available.  This option causes the compiler to pick the
+architecture of the host system.  If the compiler is unable to recognize t=
he
+architecture of the host system this option has no effect.
=20
 Where conflicting feature modifiers are specified, the right-most feature =
is
 used.
@@ -12348,6 +12352,13 @@ Additionally, this option can specify that GCC sho=
uld tune the performance
 of the code for a big.LITTLE system.  Permissible values for this
 option are: @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53}.
=20
+Additionally on native AArch64 GNU/Linux systems the value @samp{native}
+is available.
+This option causes the compiler to pick the architecture of and tune the
+performance of the code for the processor of the host system.
+If the compiler is unable to recognize the processor of the host system
+this option has no effect.
+
 Where none of @option{-mtune=3D}, @option{-mcpu=3D} or @option{-march=3D}
 are specified, the code is tuned to perform well across a range
 of target processors.
@@ -12360,7 +12371,11 @@ Specify the name of the target processor, optional=
ly suffixed by one or more
 feature modifiers.  This option has the form
 @option{-mcpu=3D@var{cpu}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, where t=
he
 permissible values for @var{cpu} are the same as those available for
-@option{-mtune}.
+@option{-mtune}.  Additionally on native AArch64 GNU/Linux systems the
+value @samp{native} is available.
+This option causes the compiler to tune the performance of the code for the
+processor of the host system.  If the compiler is unable to recognize the
+processor of the host system this option has no effect.
=20
 The permissible values for @var{feature} are documented in the sub-section
 below.

--------------010502060205000408090005--