From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1851) id C1634382D3BA; Mon, 14 Nov 2022 08:39:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C1634382D3BA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1668415158; bh=r+2E2zI0+ZAB/Gf/pQCi/RzaG4ULYt98JWDWdzo1XIo=; h=From:To:Subject:Date:From; b=FAyUu8h9CaKuvvzpp02Tp/V65hUweQUxf/qReDqapPvTAX2YYlJKsiqQOL8+3/BpT 1gYYKLwUU0YqdQd9yu9dT5gjOPjv3vujAai8GHfDvxlfxC5EmBgWCTQT3xfojnh4u7 0PlZQxjRPUaZm5tnw4Bid1mVuqrMlY8gkaxt5930= MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="utf-8" From: Martin Liska To: gcc-cvs@gcc.gnu.org Subject: [gcc r13-3993] Revert "sphinx: remove texinfo files" X-Act-Checkin: gcc X-Git-Author: Martin Liska X-Git-Refname: refs/heads/master X-Git-Oldrev: 40a39381063fdd83c4cbf5eacebfc50a2201308b X-Git-Newrev: d77de738290156fafe079182888e5e03a2f835f1 Message-Id: <20221114083918.C1634382D3BA@sourceware.org> Date: Mon, 14 Nov 2022 08:39:18 +0000 (GMT) List-Id: https://gcc.gnu.org/g:d77de738290156fafe079182888e5e03a2f835f1 commit r13-3993-gd77de738290156fafe079182888e5e03a2f835f1 Author: Martin Liska Date: Sun Nov 13 21:59:29 2022 +0100 Revert "sphinx: remove texinfo files" This reverts commit 54ca4eef58661a7d7a511e2bbbe309bde1732abf. Diff: --- gcc/d/gdc.texi | 853 + gcc/doc/analyzer.texi | 569 + gcc/doc/avr-mmcu.texi | 83 + gcc/doc/bugreport.texi | 88 + gcc/doc/cfg.texi | 684 + gcc/doc/collect2.texi | 89 + gcc/doc/compat.texi | 156 + gcc/doc/configfiles.texi | 69 + gcc/doc/configterms.texi | 61 + gcc/doc/contrib.texi | 1776 ++ gcc/doc/contribute.texi | 24 + gcc/doc/cpp.texi | 4600 +++++ gcc/doc/cppdiropts.texi | 154 + gcc/doc/cppenv.texi | 99 + gcc/doc/cppinternals.texi | 1066 ++ gcc/doc/cppopts.texi | 557 + gcc/doc/cppwarnopts.texi | 82 + gcc/doc/extend.texi | 25550 +++++++++++++++++++++++++++ gcc/doc/fragments.texi | 273 + gcc/doc/frontends.texi | 61 + gcc/doc/gcc.texi | 219 + gcc/doc/gccint.texi | 206 + gcc/doc/gcov-dump.texi | 99 + gcc/doc/gcov-tool.texi | 267 + gcc/doc/gcov.texi | 1362 ++ gcc/doc/generic.texi | 3619 ++++ gcc/doc/gimple.texi | 2772 +++ gcc/doc/gnu.texi | 20 + gcc/doc/gty.texi | 735 + gcc/doc/headerdirs.texi | 32 + gcc/doc/hostconfig.texi | 229 + gcc/doc/implement-c.texi | 746 + gcc/doc/implement-cxx.texi | 62 + gcc/doc/include/fdl.texi | 547 + gcc/doc/include/funding.texi | 60 + gcc/doc/include/gcc-common.texi | 73 + gcc/doc/include/gpl_v3.texi | 733 + gcc/doc/install.texi | 5268 ++++++ gcc/doc/interface.texi | 70 + gcc/doc/invoke.texi | 35442 ++++++++++++++++++++++++++++++++++++++ gcc/doc/languages.texi | 36 + gcc/doc/libgcc.texi | 2304 +++ gcc/doc/loop.texi | 626 + gcc/doc/lto-dump.texi | 131 + gcc/doc/lto.texi | 591 + gcc/doc/makefile.texi | 201 + gcc/doc/match-and-simplify.texi | 453 + gcc/doc/md.texi | 11679 +++++++++++++ gcc/doc/objc.texi | 1210 ++ gcc/doc/optinfo.texi | 246 + gcc/doc/options.texi | 590 + gcc/doc/passes.texi | 1196 ++ gcc/doc/plugins.texi | 562 + gcc/doc/poly-int.texi | 1060 ++ gcc/doc/portability.texi | 39 + gcc/doc/rtl.texi | 5258 ++++++ gcc/doc/service.texi | 27 + gcc/doc/sourcebuild.texi | 3987 +++++ gcc/doc/standards.texi | 336 + gcc/doc/tm.texi | 12436 +++++++++++++ gcc/doc/tm.texi.in | 7984 +++++++++ gcc/doc/tree-ssa.texi | 826 + gcc/doc/trouble.texi | 1197 ++ gcc/doc/ux.texi | 661 + gcc/fortran/gfc-internals.texi | 968 ++ gcc/fortran/gfortran.texi | 5573 ++++++ gcc/fortran/intrinsic.texi | 15435 +++++++++++++++++ gcc/fortran/invoke.texi | 2133 +++ gcc/go/gccgo.texi | 521 + libgomp/libgomp.texi | 4884 ++++++ libiberty/at-file.texi | 15 + libiberty/copying-lib.texi | 560 + libiberty/functions.texi | 2063 +++ libiberty/libiberty.texi | 313 + libiberty/obstacks.texi | 774 + libitm/libitm.texi | 788 + libquadmath/libquadmath.texi | 392 + 77 files changed, 177510 insertions(+) diff --git a/gcc/d/gdc.texi b/gcc/d/gdc.texi new file mode 100644 index 00000000000..d3bf75ccfa9 --- /dev/null +++ b/gcc/d/gdc.texi @@ -0,0 +1,853 @@ +\input texinfo @c -*-texinfo-*- +@setfilename gdc.info +@settitle The GNU D Compiler + +@c Merge the standard indexes into a single one. +@syncodeindex fn cp +@syncodeindex vr cp +@syncodeindex ky cp +@syncodeindex pg cp +@syncodeindex tp cp + +@include gcc-common.texi + +@c Copyright years for this manual. +@set copyrights-d 2006-2022 + +@copying +@c man begin COPYRIGHT +Copyright @copyright{} @value{copyrights-d} Free Software Foundation, Inc. + +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with no +Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. +A copy of the license is included in the +@c man end +section entitled ``GNU Free Documentation License''. +@ignore +@c man begin COPYRIGHT +man page gfdl(7). +@c man end +@end ignore +@end copying + +@ifinfo +@format +@dircategory Software development +@direntry +* gdc: (gdc). A GCC-based compiler for the D language +@end direntry +@end format + +@insertcopying +@end ifinfo + +@titlepage +@title The GNU D Compiler +@versionsubtitle +@author David Friedman, Iain Buclaw + +@page +@vskip 0pt plus 1filll +Published by the Free Software Foundation @* +51 Franklin Street, Fifth Floor@* +Boston, MA 02110-1301, USA@* +@sp 1 +@insertcopying +@end titlepage +@contents +@page + +@node Top +@top Introduction + +This manual describes how to use @command{gdc}, the GNU compiler for +the D programming language. This manual is specifically about +@command{gdc}. For more information about the D programming +language in general, including language specifications and standard +package documentation, see @uref{https://dlang.org/}. + +@menu +* Copying:: The GNU General Public License. +* GNU Free Documentation License:: + How you can share and copy this manual. +* Invoking gdc:: How to run gdc. +* Index:: Index. +@end menu + + +@include gpl_v3.texi + +@include fdl.texi + + +@node Invoking gdc +@chapter Invoking gdc + +@c man title gdc A GCC-based compiler for the D language + +@ignore +@c man begin SYNOPSIS gdc +gdc [@option{-c}|@option{-S}] [@option{-g}] [@option{-pg}] + [@option{-O}@var{level}] [@option{-W}@var{warn}@dots{}] + [@option{-I}@var{dir}@dots{}] [@option{-L}@var{dir}@dots{}] + [@option{-f}@var{option}@dots{}] [@option{-m}@var{machine-option}@dots{}] + [@option{-o} @var{outfile}] [@@@var{file}] @var{infile}@dots{} + +Only the most useful options are listed here; see below for the +remainder. +@c man end +@c man begin SEEALSO +gpl(7), gfdl(7), fsf-funding(7), gcc(1) +and the Info entries for @file{gdc} and @file{gcc}. +@c man end +@end ignore + +@c man begin DESCRIPTION gdc + +The @command{gdc} command is the GNU compiler for the D language and +supports many of the same options as @command{gcc}. @xref{Option Summary, , +Option Summary, gcc, Using the GNU Compiler Collection (GCC)}. +This manual only documents the options specific to @command{gdc}. + +@c man end + +@menu +* Input and Output files:: Controlling the kind of output: + an executable, object files, assembler files, +* Runtime Options:: Options controlling runtime behavior +* Directory Options:: Where to find module files +* Code Generation:: Options controlling the output of gdc +* Warnings:: Options controlling warnings specific to gdc +* Linking:: Options influencing the linking step +* Developer Options:: Options useful for developers of gdc +@end menu + +@c man begin OPTIONS + +@node Input and Output files +@section Input and Output files +@cindex suffixes for D source +@cindex D source file suffixes + +For any given input file, the file name suffix determines what kind of +compilation is done. The following kinds of input file names are supported: + +@table @gcctabopt +@item @var{file}.d +D source files. +@item @var{file}.dd +Ddoc source files. +@item @var{file}.di +D interface files. +@end table + +You can specify more than one input file on the @command{gdc} command line, +each being compiled separately in the compilation process. If you specify a +@code{-o @var{file}} option, all the input files are compiled together, +producing a single output file, named @var{file}. This is allowed even +when using @code{-S} or @code{-c}. + +@cindex D interface files. +A D interface file contains only what an import of the module needs, +rather than the whole implementation of that module. They can be created +by @command{gdc} from a D source file by using the @code{-H} option. +When the compiler resolves an import declaration, it searches for matching +@file{.di} files first, then for @file{.d}. + +@cindex Ddoc source files. +A Ddoc source file contains code in the D macro processor language. It is +primarily designed for use in producing user documentation from embedded +comments, with a slight affinity towards HTML generation. If a @file{.d} +source file starts with the string @code{Ddoc} then it is treated as general +purpose documentation, not as a D source file. + +@node Runtime Options +@section Runtime Options +@cindex options, runtime + +These options affect the runtime behavior of programs compiled with +@command{gdc}. + +@table @gcctabopt + +@item -fall-instantiations +@cindex @option{-fall-instantiations} +@cindex @option{-fno-all-instantiations} +Generate code for all template instantiations. The default template emission +strategy is to not generate code for declarations that were either +instantiated speculatively, such as from @code{__traits(compiles, ...)}, or +that come from an imported module not being compiled. + +@item -fno-assert +@cindex @option{-fassert} +@cindex @option{-fno-assert} +Turn off code generation for @code{assert} contracts. + +@item -fno-bounds-check +@cindex @option{-fbounds-check} +@cindex @option{-fno-bounds-check} +Turns off array bounds checking for all functions, which can improve +performance for code that uses arrays extensively. Note that this +can result in unpredictable behavior if the code in question actually +does violate array bounds constraints. It is safe to use this option +if you are sure that your code never throws a @code{RangeError}. + +@item -fbounds-check=@var{value} +@cindex @option{-fbounds-check=} +An alternative to @option{-fbounds-check} that allows more control +as to where bounds checking is turned on or off. The following values +are supported: + +@table @samp +@item on +Turns on array bounds checking for all functions. +@item safeonly +Turns on array bounds checking only for @code{@@safe} functions. +@item off +Turns off array bounds checking completely. +@end table + +@item -fno-builtin +@cindex @option{-fbuiltin} +@cindex @option{-fno-builtin} +Don't recognize built-in functions unless they begin with the prefix +@samp{__builtin_}. By default, the compiler will recognize when a +function in the @code{core.stdc} package is a built-in function. + +@item -fcheckaction=@var{value} +@cindex @option{-fcheckaction} +This option controls what code is generated on an assertion, bounds check, or +final switch failure. The following values are supported: + +@table @samp +@item context +Throw an @code{AssertError} with extra context information. +@item halt +Halt the program execution. +@item throw +Throw an @code{AssertError} (the default). +@end table + +@item -fdebug +@item -fdebug=@var{value} +@cindex @option{-fdebug} +@cindex @option{-fno-debug} +Turn on compilation of conditional @code{debug} code into the program. +The @option{-fdebug} option itself sets the debug level to @code{1}, +while @option{-fdebug=} enables @code{debug} code that are identified +by any of the following values: + +@table @samp +@item level +Sets the debug level to @var{level}, any @code{debug} code <= @var{level} +is compiled into the program. +@item ident +Turns on compilation of any @code{debug} code identified by @var{ident}. +@end table + +@item -fno-druntime +@cindex @option{-fdruntime} +@cindex @option{-fno-druntime} +Implements @uref{https://dlang.org/spec/betterc.html}. Assumes that +compilation targets an environment without a D runtime library. + +This is equivalent to compiling with the following options: + +@example +gdc -nophoboslib -fno-exceptions -fno-moduleinfo -fno-rtti +@end example + +@item -fextern-std=@var{standard} +@cindex @option{-fextern-std} +Sets the C++ name mangling compatibility to the version identified by +@var{standard}. The following values are supported: + +@table @samp +@item c++98 +@item c++03 +Sets @code{__traits(getTargetInfo, "cppStd")} to @code{199711}. +@item c++11 +Sets @code{__traits(getTargetInfo, "cppStd")} to @code{201103}. +@item c++14 +Sets @code{__traits(getTargetInfo, "cppStd")} to @code{201402}. +@item c++17 +Sets @code{__traits(getTargetInfo, "cppStd")} to @code{201703}. +This is the default. +@item c++20 +Sets @code{__traits(getTargetInfo, "cppStd")} to @code{202002}. +@end table + +@item -fno-invariants +@cindex @option{-finvariants} +@cindex @option{-fno-invariants} +Turns off code generation for class @code{invariant} contracts. + +@item -fmain +@cindex @option{-fmain} +Generates a default @code{main()} function when compiling. This is useful when +unittesting a library, as it enables running the unittests in a library without +having to manually define an entry-point function. This option does nothing +when @code{main} is already defined in user code. + +@item -fno-moduleinfo +@cindex @option{-fmoduleinfo} +@cindex @option{-fno-moduleinfo} +Turns off generation of the @code{ModuleInfo} and related functions +that would become unreferenced without it, which may allow linking +to programs not written in D. Functions that are not be generated +include module constructors and destructors (@code{static this} and +@code{static ~this}), @code{unittest} code, and @code{DSO} registry +functions for dynamically linked code. + +@item -fonly=@var{filename} +@cindex @option{-fonly} +Tells the compiler to parse and run semantic analysis on all modules +on the command line, but only generate code for the module specified +by @var{filename}. + +@item -fno-postconditions +@cindex @option{-fpostconditions} +@cindex @option{-fno-postconditions} +Turns off code generation for postcondition @code{out} contracts. + +@item -fno-preconditions +@cindex @option{-fpreconditions} +@cindex @option{-fno-preconditions} +Turns off code generation for precondition @code{in} contracts. + +@item -fpreview=@var{id} +@cindex @option{-fpreview} +Turns on an upcoming D language change identified by @var{id}. The following +values are supported: + +@table @samp +@item all +Turns on all upcoming D language features. +@item dip1000 +Implements @uref{https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1000.md} +(Scoped pointers). +@item dip1008 +Implements @uref{https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1008.md} +(Allow exceptions in @code{@@nogc} code). +@item dip1021 +Implements @uref{https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1021.md} +(Mutable function arguments). +@item dip25 +Implements @uref{https://github.com/dlang/DIPs/blob/master/DIPs/archive/DIP25.md} +(Sealed references). +@item dtorfields +Turns on generation for destructing fields of partially constructed objects. +@item fieldwise +Turns on generation of struct equality to use field-wise comparisons. +@item fixaliasthis +Implements new lookup rules that check the current scope for @code{alias this} +before searching in upper scopes. +@item fiximmutableconv +Disallows unsound immutable conversions that were formerly incorrectly +permitted. +@item in +Implements @code{in} parameters to mean @code{scope const [ref]} and accepts +rvalues. +@item inclusiveincontracts +Implements @code{in} contracts of overridden methods to be a superset of parent +contract. +@item intpromote +Implements C-style integral promotion for unary @code{+}, @code{-} and @code{~} +expressions. +@item nosharedaccess +Turns off and disallows all access to shared memory objects. +@item rvaluerefparam +Implements rvalue arguments to @code{ref} parameters. +@item systemvariables +Disables access to variables marked @code{@@system} from @code{@@safe} code. +@end table + +@item -frelease +@cindex @option{-frelease} +@cindex @option{-fno-release} +Turns on compiling in release mode, which means not emitting runtime +checks for contracts and asserts. Array bounds checking is not done +for @code{@@system} and @code{@@trusted} functions, and assertion +failures are undefined behavior. + +This is equivalent to compiling with the following options: + +@example +gdc -fno-assert -fbounds-check=safe -fno-invariants \ + -fno-postconditions -fno-preconditions -fno-switch-errors +@end example + +@item -frevert= +@cindex @option{-frevert} +Turns off a D language feature identified by @var{id}. The following values +are supported: + +@table @samp +@item all +Turns off all revertable D language features. +@item dip25 +Reverts @uref{https://github.com/dlang/DIPs/blob/master/DIPs/archive/DIP25.md} +(Sealed references). +@item dtorfields +Turns off generation for destructing fields of partially constructed objects. +@item markdown +Turns off Markdown replacements in Ddoc comments. +@end table + +@item -fno-rtti +@cindex @option{-frtti} +@cindex @option{-fno-rtti} +Turns off generation of run-time type information for all user defined types. +Any code that uses features of the language that require access to this +information will result in an error. + +@item -fno-switch-errors +@cindex @option{-fswitch-errors} +@cindex @option{-fno-switch-errors} +This option controls what code is generated when no case is matched +in a @code{final switch} statement. The default run time behavior +is to throw a @code{SwitchError}. Turning off @option{-fswitch-errors} +means that instead the execution of the program is immediately halted. + +@item -funittest +@cindex @option{-funittest} +@cindex @option{-fno-unittest} +Turns on compilation of @code{unittest} code, and turns on the +@code{version(unittest)} identifier. This implies @option{-fassert}. + +@item -fversion=@var{value} +@cindex @option{-fversion} +Turns on compilation of conditional @code{version} code into the program +identified by any of the following values: + +@table @samp +@item level +Sets the version level to @var{level}, any @code{version} code >= @var{level} +is compiled into the program. +@item ident +Turns on compilation of @code{version} code identified by @var{ident}. +@end table + +@item -fno-weak-templates +@cindex @option{-fweak-templates} +@cindex @option{-fno-weak-templates} +Turns off emission of declarations that can be defined in multiple objects as +weak symbols. The default is to emit all public symbols as weak, unless the +target lacks support for weak symbols. Disabling this option means that common +symbols are instead put in COMDAT or become private. + +@end table + +@node Directory Options +@section Options for Directory Search +@cindex directory options +@cindex options, directory search +@cindex search path + +These options specify directories to search for files, libraries, and +other parts of the compiler: + +@table @gcctabopt + +@item -I@var{dir} +@cindex @option{-I} +Specify a directory to use when searching for imported modules at +compile time. Multiple @option{-I} options can be used, and the +paths are searched in the same order. + +@item -J@var{dir} +@cindex @option{-J} +Specify a directory to use when searching for files in string imports +at compile time. This switch is required in order to use +@code{import(file)} expressions. Multiple @option{-J} options can be +used, and the paths are searched in the same order. + +@item -L@var{dir} +@cindex @option{-L} +When linking, specify a library search directory, as with @command{gcc}. + +@item -B@var{dir} +@cindex @option{-B} +This option specifies where to find the executables, libraries, +source files, and data files of the compiler itself, as with @command{gcc}. + +@item -fmodule-file=@var{module}=@var{spec} +@cindex @option{-fmodule-file} +This option manipulates file paths of imported modules, such that if an +imported module matches all or the leftmost part of @var{module}, the file +path in @var{spec} is used as the location to search for D sources. +This is used when the source file path and names are not the same as the +package and module hierarchy. Consider the following examples: + +@example +gdc test.d -fmodule-file=A.B=foo.d -fmodule-file=C=bar +@end example + +This will tell the compiler to search in all import paths for the source +file @var{foo.d} when importing @var{A.B}, and the directory @var{bar/} +when importing @var{C}, as annotated in the following D code: + +@example +module test; +import A.B; // Matches A.B, searches for foo.d +import C.D.E; // Matches C, searches for bar/D/E.d +import A.B.C; // No match, searches for A/B/C.d +@end example + +@item -imultilib @var{dir} +@cindex @option{-imultilib} +Use @var{dir} as a subdirectory of the gcc directory containing +target-specific D sources and interfaces. + +@item -iprefix @var{prefix} +@cindex @option{-iprefix} +Specify @var{prefix} as the prefix for the gcc directory containing +target-specific D sources and interfaces. If the @var{prefix} represents +a directory, you should include the final @code{'/'}. + +@item -nostdinc +@cindex @option{-nostdinc} +Do not search the standard system directories for D source and interface +files. Only the directories that have been specified with @option{-I} options +(and the directory of the current file, if appropriate) are searched. + +@end table + +@node Code Generation +@section Code Generation +@cindex options, code generation + +In addition to the many @command{gcc} options controlling code generation, +@command{gdc} has several options specific to itself. + +@table @gcctabopt + +@item -H +@cindex @option{-H} +Generates D interface files for all modules being compiled. The compiler +determines the output file based on the name of the input file, removes +any directory components and suffix, and applies the @file{.di} suffix. + +@item -Hd @var{dir} +@cindex @option{-Hd} +Same as @option{-H}, but writes interface files to directory @var{dir}. +This option can be used with @option{-Hf @var{file}} to independently set the +output file and directory path. + +@item -Hf @var{file} +@cindex @option{-Hf} +Same as @option{-H} but writes interface files to @var{file}. This option can +be used with @option{-Hd @var{dir}} to independently set the output file and +directory path. + +@item -M +@cindex @option{-M} +Output the module dependencies of all source files being compiled in a +format suitable for @command{make}. The compiler outputs one +@command{make} rule containing the object file name for that source file, +a colon, and the names of all imported files. + +@item -MM +@cindex @option{-MM} +Like @option{-M} but does not mention imported modules from the D standard +library package directories. + +@item -MF @var{file} +@cindex @option{-MF} +When used with @option{-M} or @option{-MM}, specifies a @var{file} to write +the dependencies to. When used with the driver options @option{-MD} or +@option{-MMD}, @option{-MF} overrides the default dependency output file. + +@item -MG +@cindex @option{-MG} +This option is for compatibility with @command{gcc}, and is ignored by the +compiler. + +@item -MP +@cindex @option{-MP} +Outputs a phony target for each dependency other than the modules being +compiled, causing each to depend on nothing. + +@item -MT @var{target} +@cindex @option{-MT} +Change the @var{target} of the rule emitted by dependency generation +to be exactly the string you specify. If you want multiple targets, +you can specify them as a single argument to @option{-MT}, or use +multiple @option{-MT} options. + +@item -MQ @var{target} +@cindex @option{-MQ} +Same as @option{-MT}, but it quotes any characters which are special to +@command{make}. + +@item -MD +@cindex @option{-MD} +This option is equivalent to @option{-M -MF @var{file}}. The driver +determines @var{file} by removing any directory components and suffix +from the input file, and then adding a @file{.deps} suffix. + +@item -MMD +@cindex @option{-MMD} +Like @option{-MD} but does not mention imported modules from the D standard +library package directories. + +@item -X +@cindex @option{-X} +Output information describing the contents of all source files being +compiled in JSON format to a file. The driver determines @var{file} by +removing any directory components and suffix from the input file, and then +adding a @file{.json} suffix. + +@item -Xf @var{file} +@cindex @option{-Xf} +Same as @option{-X}, but writes all JSON contents to the specified +@var{file}. + +@item -fdoc +@cindex @option{-fdoc} +Generates @code{Ddoc} documentation and writes it to a file. The compiler +determines @var{file} by removing any directory components and suffix +from the input file, and then adding a @file{.html} suffix. + +@item -fdoc-dir=@var{dir} +@cindex @option{-fdoc-dir} +Same as @option{-fdoc}, but writes documentation to directory @var{dir}. +This option can be used with @option{-fdoc-file=@var{file}} to +independently set the output file and directory path. + +@item -fdoc-file=@var{file} +@cindex @option{-fdoc-file} +Same as @option{-fdoc}, but writes documentation to @var{file}. This +option can be used with @option{-fdoc-dir=@var{dir}} to independently +set the output file and directory path. + +@item -fdoc-inc=@var{file} +@cindex @option{-fdoc-inc} +Specify @var{file} as a @var{Ddoc} macro file to be read. Multiple +@option{-fdoc-inc} options can be used, and files are read and processed +in the same order. + +@item -fdump-c++-spec=@var{file} +For D source files, generate corresponding C++ declarations in @var{file}. + +@item -fdump-c++-spec-verbose +In conjunction with @option{-fdump-c++-spec=} above, add comments for ignored +declarations in the generated C++ header. + +@item -fsave-mixins=@var{file} +@cindex @option{-fsave-mixins} +Generates code expanded from D @code{mixin} statements and writes the +processed sources to @var{file}. This is useful to debug errors in compilation +and provides source for debuggers to show when requested. + +@end table + +@node Warnings +@section Warnings +@cindex options to control warnings +@cindex warning messages +@cindex messages, warning +@cindex suppressing warnings + +Warnings are diagnostic messages that report constructions that +are not inherently erroneous but that are risky or suggest there +is likely to be a bug in the program. Unless @option{-Werror} is +specified, they do not prevent compilation of the program. + +@table @gcctabopt + +@item -Wall +@cindex @option{-Wall} +@cindex @option{-Wno-all} +Turns on all warnings messages. Warnings are not a defined part of +the D language, and all constructs for which this may generate a +warning message are valid code. + +@item -Walloca +@cindex @option{-Walloca} +This option warns on all uses of "alloca" in the source. + +@item -Walloca-larger-than=@var{n} +@cindex @option{-Walloca-larger-than} +@cindex @option{-Wno-alloca-larger-than} +Warn on unbounded uses of alloca, and on bounded uses of alloca +whose bound can be larger than @var{n} bytes. +@option{-Wno-alloca-larger-than} disables +@option{-Walloca-larger-than} warning and is equivalent to +@option{-Walloca-larger-than=@var{SIZE_MAX}} or larger. + +@item -Wcast-result +@cindex @option{-Wcast-result} +@cindex @option{-Wno-cast-result} +Warn about casts that will produce a null or zero result. Currently +this is only done for casting between an imaginary and non-imaginary +data type, or casting between a D and C++ class. + +@item -Wno-deprecated +@cindex @option{-Wdeprecated} +@cindex @option{-Wno-deprecated} +Do not warn about usage of deprecated features and symbols with +@code{deprecated} attributes. + +@item -Werror +@cindex @option{-Werror} +@cindex @option{-Wno-error} +Turns all warnings into errors. + +@item -Wspeculative +@cindex @option{-Wspeculative} +@cindex @option{-Wno-speculative} +List all error messages from speculative compiles, such as +@code{__traits(compiles, ...)}. This option does not report +messages as warnings, and these messages therefore never become +errors when the @option{-Werror} option is also used. + +@item -Wtemplates +@cindex @option{-Wtemplates} +@cindex @option{-Wno-templates} +Warn when a template instantiation is encountered. Some coding +rules disallow templates, and this may be used to enforce that rule. + +@item -Wunknown-pragmas +@cindex @option{-Wunknown-pragmas} +@cindex @option{-Wno-unknown-pragmas} +Warn when a @code{pragma()} is encountered that is not understood by +@command{gdc}. This differs from @option{-fignore-unknown-pragmas} +where a pragma that is part of the D language, but not implemented by +the compiler, won't get reported. + +@item -Wno-varargs +@cindex Wvarargs +@cindex Wno-varargs +Do not warn upon questionable usage of the macros used to handle variable +arguments like @code{va_start}. + +@item -fignore-unknown-pragmas +@cindex @option{-fignore-unknown-pragmas} +@cindex @option{-fno-ignore-unknown-pragmas} +Turns off errors for unsupported pragmas. + +@item -fmax-errors=@var{n} +@cindex @option{-fmax-errors} +Limits the maximum number of error messages to @var{n}, at which point +@command{gdc} bails out rather than attempting to continue processing the +source code. If @var{n} is 0 (the default), there is no limit on the +number of error messages produced. + +@item -fsyntax-only +@cindex @option{-fsyntax-only} +@cindex @option{-fno-syntax-only} +Check the code for syntax errors, but do not actually compile it. This +can be used in conjunction with @option{-fdoc} or @option{-H} to generate +files for each module present on the command-line, but no other output +file. + +@item -ftransition=@var{id} +@cindex @option{-ftransition} +Report additional information about D language changes identified by +@var{id}. The following values are supported: + +@table @samp +@item all +List information on all D language transitions. +@item complex +List all usages of complex or imaginary types. +@item field +List all non-mutable fields which occupy an object instance. +@item in +List all usages of @code{in} on parameter. +@item nogc +List all hidden GC allocations. +@item templates +List statistics on template instantiations. +@item tls +List all variables going into thread local storage. +@item vmarkdown +List instances of Markdown replacements in Ddoc. +@end table + +@end table + +@node Linking +@section Options for Linking +@cindex options, linking +@cindex linking, static + +These options come into play when the compiler links object files into an +executable output file. They are meaningless if the compiler is not doing +a link step. + +@table @gcctabopt + +@item -defaultlib=@var{libname} +@cindex @option{-defaultlib=} +Specify the library to use instead of libphobos when linking. Options +specifying the linkage of libphobos, such as @option{-static-libphobos} +or @option{-shared-libphobos}, are ignored. + +@item -debuglib=@var{libname} +@cindex @option{-debuglib=} +Specify the debug library to use instead of libphobos when linking. +This option has no effect unless the @option{-g} option was also given +on the command line. Options specifying the linkage of libphobos, such +as @option{-static-libphobos} or @option{-shared-libphobos}, are ignored. + +@item -nophoboslib +@cindex @option{-nophoboslib} +Do not use the Phobos or D runtime library when linking. Options specifying +the linkage of libphobos, such as @option{-static-libphobos} or +@option{-shared-libphobos}, are ignored. The standard system libraries are +used normally, unless @option{-nostdlib} or @option{-nodefaultlibs} is used. + +@item -shared-libphobos +@cindex @option{-shared-libphobos} +On systems that provide @file{libgphobos} and @file{libgdruntime} as a +shared and a static library, this option forces the use of the shared +version. If no shared version was built when the compiler was configured, +this option has no effect. + +@item -static-libphobos +@cindex @option{-static-libphobos} +On systems that provide @file{libgphobos} and @file{libgdruntime} as a +shared and a static library, this option forces the use of the static +version. If no static version was built when the compiler was configured, +this option has no effect. + +@end table + +@node Developer Options +@section Developer Options +@cindex developer options +@cindex debug dump options +@cindex dump options + +This section describes command-line options that are primarily of +interest to developers or language tooling. + +@table @gcctabopt + +@item -fdump-d-original +@cindex @option{-fdump-d-original} +Output the internal front-end AST after the @code{semantic3} stage. +This option is only useful for debugging the GNU D compiler itself. + +@item -v +@cindex @option{-v} +Dump information about the compiler language processing stages as the source +program is being compiled. This includes listing all modules that are +processed through the @code{parse}, @code{semantic}, @code{semantic2}, and +@code{semantic3} stages; all @code{import} modules and their file paths; +and all @code{function} bodies that are being compiled. + +@end table + +@c man end + +@node Index +@unnumbered Index + +@printindex cp + +@bye diff --git a/gcc/doc/analyzer.texi b/gcc/doc/analyzer.texi new file mode 100644 index 00000000000..ec49f951435 --- /dev/null +++ b/gcc/doc/analyzer.texi @@ -0,0 +1,569 @@ +@c Copyright (C) 2019-2022 Free Software Foundation, Inc. +@c This is part of the GCC manual. +@c For copying conditions, see the file gcc.texi. +@c Contributed by David Malcolm . + +@node Static Analyzer +@chapter Static Analyzer +@cindex analyzer +@cindex static analysis +@cindex static analyzer + +@menu +* Analyzer Internals:: Analyzer Internals +* Debugging the Analyzer:: Useful debugging tips +@end menu + +@node Analyzer Internals +@section Analyzer Internals +@cindex analyzer, internals +@cindex static analyzer, internals + +@subsection Overview + +The analyzer implementation works on the gimple-SSA representation. +(I chose this in the hopes of making it easy to work with LTO to +do whole-program analysis). + +The implementation is read-only: it doesn't attempt to change anything, +just emit warnings. + +The gimple representation can be seen using @option{-fdump-ipa-analyzer}. +@quotation Tip +If the analyzer ICEs before this is written out, one workaround is to use +@option{--param=analyzer-bb-explosion-factor=0} to force the analyzer +to bail out after analyzing the first basic block. +@end quotation + +First, we build a @code{supergraph} which combines the callgraph and all +of the CFGs into a single directed graph, with both interprocedural and +intraprocedural edges. The nodes and edges in the supergraph are called +``supernodes'' and ``superedges'', and often referred to in code as +@code{snodes} and @code{sedges}. Basic blocks in the CFGs are split at +interprocedural calls, so there can be more than one supernode per +basic block. Most statements will be in just one supernode, but a call +statement can appear in two supernodes: at the end of one for the call, +and again at the start of another for the return. + +The supergraph can be seen using @option{-fdump-analyzer-supergraph}. + +We then build an @code{analysis_plan} which walks the callgraph to +determine which calls might be suitable for being summarized (rather +than fully explored) and thus in what order to explore the functions. + +Next is the heart of the analyzer: we use a worklist to explore state +within the supergraph, building an "exploded graph". +Nodes in the exploded graph correspond to pairs, as in + "Precise Interprocedural Dataflow Analysis via Graph Reachability" + (Thomas Reps, Susan Horwitz and Mooly Sagiv). + +We reuse nodes for pairs we've already seen, and avoid +tracking state too closely, so that (hopefully) we rapidly converge +on a final exploded graph, and terminate the analysis. We also bail +out if the number of exploded nodes gets +larger than a particular multiple of the total number of basic blocks +(to ensure termination in the face of pathological state-explosion +cases, or bugs). We also stop exploring a point once we hit a limit +of states for that point. + +We can identify problems directly when processing a +instance. For example, if we're finding the successors of + +@smallexample + +@end smallexample + +then we can detect a double-free of "ptr". We can then emit a path +to reach the problem by finding the simplest route through the graph. + +Program points in the analysis are much more fine-grained than in the +CFG and supergraph, with points (and thus potentially exploded nodes) +for various events, including before individual statements. +By default the exploded graph merges multiple consecutive statements +in a supernode into one exploded edge to minimize the size of the +exploded graph. This can be suppressed via +@option{-fanalyzer-fine-grained}. +The fine-grained approach seems to make things simpler and more debuggable +that other approaches I tried, in that each point is responsible for one +thing. + +Program points in the analysis also have a "call string" identifying the +stack of callsites below them, so that paths in the exploded graph +correspond to interprocedurally valid paths: we always return to the +correct call site, propagating state information accordingly. +We avoid infinite recursion by stopping the analysis if a callsite +appears more than @code{analyzer-max-recursion-depth} in a callstring +(defaulting to 2). + +@subsection Graphs + +Nodes and edges in the exploded graph are called ``exploded nodes'' and +``exploded edges'' and often referred to in the code as +@code{enodes} and @code{eedges} (especially when distinguishing them +from the @code{snodes} and @code{sedges} in the supergraph). + +Each graph numbers its nodes, giving unique identifiers - supernodes +are referred to throughout dumps in the form @samp{SN': @var{index}} and +exploded nodes in the form @samp{EN: @var{index}} (e.g. @samp{SN: 2} and +@samp{EN:29}). + +The supergraph can be seen using @option{-fdump-analyzer-supergraph-graph}. + +The exploded graph can be seen using @option{-fdump-analyzer-exploded-graph} +and other dump options. Exploded nodes are color-coded in the .dot output +based on state-machine states to make it easier to see state changes at +a glance. + +@subsection State Tracking + +There's a tension between: +@itemize @bullet +@item +precision of analysis in the straight-line case, vs +@item +exponential blow-up in the face of control flow. +@end itemize + +For example, in general, given this CFG: + +@smallexample + A + / \ + B C + \ / + D + / \ + E F + \ / + G +@end smallexample + +we want to avoid differences in state-tracking in B and C from +leading to blow-up. If we don't prevent state blowup, we end up +with exponential growth of the exploded graph like this: + +@smallexample + + 1:A + / \ + / \ + / \ + 2:B 3:C + | | + 4:D 5:D (2 exploded nodes for D) + / \ / \ + 6:E 7:F 8:E 9:F + | | | | + 10:G 11:G 12:G 13:G (4 exploded nodes for G) + +@end smallexample + +Similar issues arise with loops. + +To prevent this, we follow various approaches: + +@enumerate a +@item +state pruning: which tries to discard state that won't be relevant +later on withing the function. +This can be disabled via @option{-fno-analyzer-state-purge}. + +@item +state merging. We can try to find the commonality between two +program_state instances to make a third, simpler program_state. +We have two strategies here: + + @enumerate + @item + the worklist keeps new nodes for the same program_point together, + and tries to merge them before processing, and thus before they have + successors. Hence, in the above, the two nodes for D (4 and 5) reach + the front of the worklist together, and we create a node for D with + the merger of the incoming states. + + @item + try merging with the state of existing enodes for the program_point + (which may have already been explored). There will be duplication, + but only one set of duplication; subsequent duplicates are more likely + to hit the cache. In particular, (hopefully) all merger chains are + finite, and so we guarantee termination. + This is intended to help with loops: we ought to explore the first + iteration, and then have a "subsequent iterations" exploration, + which uses a state merged from that of the first, to be more abstract. + @end enumerate + +We avoid merging pairs of states that have state-machine differences, +as these are the kinds of differences that are likely to be most +interesting. So, for example, given: + +@smallexample + if (condition) + ptr = malloc (size); + else + ptr = local_buf; + + .... do things with 'ptr' + + if (condition) + free (ptr); + + ...etc +@end smallexample + +then we end up with an exploded graph that looks like this: + +@smallexample + + if (condition) + / T \ F + --------- ---------- + / \ + ptr = malloc (size) ptr = local_buf + | | + copy of copy of + "do things with 'ptr'" "do things with 'ptr'" + with ptr: heap-allocated with ptr: stack-allocated + | | + if (condition) if (condition) + | known to be T | known to be F + free (ptr); | + \ / + ----------------------------- + | ('ptr' is pruned, so states can be merged) + etc + +@end smallexample + +where some duplication has occurred, but only for the places where the +the different paths are worth exploringly separately. + +Merging can be disabled via @option{-fno-analyzer-state-merge}. +@end enumerate + +@subsection Region Model + +Part of the state stored at a @code{exploded_node} is a @code{region_model}. +This is an implementation of the region-based ternary model described in +@url{https://www.researchgate.net/publication/221430855_A_Memory_Model_for_Static_Analysis_of_C_Programs, +"A Memory Model for Static Analysis of C Programs"} +(Zhongxing Xu, Ted Kremenek, and Jian Zhang). + +A @code{region_model} encapsulates a representation of the state of +memory, with a @code{store} recording a binding between @code{region} +instances, to @code{svalue} instances. The bindings are organized into +clusters, where regions accessible via well-defined pointer arithmetic +are in the same cluster. The representation is graph-like because values +can be pointers to regions. It also stores a constraint_manager, +capturing relationships between the values. + +Because each node in the @code{exploded_graph} has a @code{region_model}, +and each of the latter is graph-like, the @code{exploded_graph} is in some +ways a graph of graphs. + +Here's an example of printing a @code{program_state}, showing the +@code{region_model} within it, along with state for the @code{malloc} +state machine. + +@smallexample +(gdb) call debug (*this) +rmodel: +stack depth: 1 + frame (index 0): frame: ‘test’@@1 +clusters within frame: ‘test’@@1 + cluster for: ptr_3: &HEAP_ALLOCATED_REGION(12) +m_called_unknown_fn: FALSE +constraint_manager: + equiv classes: + constraints: +malloc: + 0x2e89590: &HEAP_ALLOCATED_REGION(12): unchecked ('ptr_3') +@end smallexample + +This is the state at the point of returning from @code{calls_malloc} back +to @code{test} in the following: + +@smallexample +void * +calls_malloc (void) +@{ + void *result = malloc (1024); + return result; +@} + +void test (void) +@{ + void *ptr = calls_malloc (); + /* etc. */ +@} +@end smallexample + +Within the store, there is the cluster for @code{ptr_3} within the frame +for @code{test}, where the whole cluster is bound to a pointer value, +pointing at @code{HEAP_ALLOCATED_REGION(12)}. Additionally, this pointer +has the @code{unchecked} state for the @code{malloc} state machine +indicating it hasn't yet been checked against NULL since the allocation +call. + +@subsection Analyzer Paths + +We need to explain to the user what the problem is, and to persuade them +that there really is a problem. Hence having a @code{diagnostic_path} +isn't just an incidental detail of the analyzer; it's required. + +Paths ought to be: +@itemize @bullet +@item +interprocedurally-valid +@item +feasible +@end itemize + +Without state-merging, all paths in the exploded graph are feasible +(in terms of constraints being satisfied). +With state-merging, paths in the exploded graph can be infeasible. + +We collate warnings and only emit them for the simplest path +e.g. for a bug in a utility function, with lots of routes to calling it, +we only emit the simplest path (which could be intraprocedural, if +it can be reproduced without a caller). + +We thus want to find the shortest feasible path through the exploded +graph from the origin to the exploded node at which the diagnostic was +saved. Unfortunately, if we simply find the shortest such path and +check if it's feasible we might falsely reject the diagnostic, as there +might be a longer path that is feasible. Examples include the cases +where the diagnostic requires us to go at least once around a loop for a +later condition to be satisfied, or where for a later condition to be +satisfied we need to enter a suite of code that the simpler path skips. + +We attempt to find the shortest feasible path to each diagnostic by +first constructing a ``trimmed graph'' from the exploded graph, +containing only those nodes and edges from which there are paths to +the target node, and using Dijkstra's algorithm to order the trimmed +nodes by minimal distance to the target. + +We then use a worklist to iteratively build a ``feasible graph'' +(actually a tree), capturing the pertinent state along each path, in +which every path to a ``feasible node'' is feasible by construction, +restricting ourselves to the trimmed graph to ensure we stay on target, +and ordering the worklist so that the first feasible path we find to the +target node is the shortest possible path. Hence we start by trying the +shortest possible path, but if that fails, we explore progressively +longer paths, eventually trying iterations through loops. The +exploration is captured in the feasible_graph, which can be dumped as a +.dot file via @option{-fdump-analyzer-feasibility} to visualize the +exploration. The indices of the feasible nodes show the order in which +they were created. We effectively explore the tree of feasible paths in +order of shortest path until we either find a feasible path to the +target node, or hit a limit and give up. + +This is something of a brute-force approach, but the trimmed graph +hopefully keeps the complexity manageable. + +This algorithm can be disabled (for debugging purposes) via +@option{-fno-analyzer-feasibility}, which simply uses the shortest path, +and notes if it is infeasible. + +The above gives us a shortest feasible @code{exploded_path} through the +@code{exploded_graph} (a list of @code{exploded_edge *}). We use this +@code{exploded_path} to build a @code{diagnostic_path} (a list of +@strong{events} for the diagnostic subsystem) - specifically a +@code{checker_path}. + +Having built the @code{checker_path}, we prune it to try to eliminate +events that aren't relevant, to minimize how much the user has to read. + +After pruning, we notify each event in the path of its ID and record the +IDs of interesting events, allowing for events to refer to other events +in their descriptions. The @code{pending_diagnostic} class has various +vfuncs to support emitting more precise descriptions, so that e.g. + +@itemize @bullet +@item +a deref-of-unchecked-malloc diagnostic might use: +@smallexample + returning possibly-NULL pointer to 'make_obj' from 'allocator' +@end smallexample +for a @code{return_event} to make it clearer how the unchecked value moves +from callee back to caller +@item +a double-free diagnostic might use: +@smallexample + second 'free' here; first 'free' was at (3) +@end smallexample +and a use-after-free might use +@smallexample + use after 'free' here; memory was freed at (2) +@end smallexample +@end itemize + +At this point we can emit the diagnostic. + +@subsection Limitations + +@itemize @bullet +@item +Only for C so far +@item +The implementation of call summaries is currently very simplistic. +@item +Lack of function pointer analysis +@item +The constraint-handling code assumes reflexivity in some places +(that values are equal to themselves), which is not the case for NaN. +As a simple workaround, constraints on floating-point values are +currently ignored. +@item +There are various other limitations in the region model (grep for TODO/xfail +in the testsuite). +@item +The constraint_manager's implementation of transitivity is currently too +expensive to enable by default and so must be manually enabled via +@option{-fanalyzer-transitivity}). +@item +The checkers are currently hardcoded and don't allow for user extensibility +(e.g. adding allocate/release pairs). +@item +Although the analyzer's test suite has a proof-of-concept test case for +LTO, LTO support hasn't had extensive testing. There are various +lang-specific things in the analyzer that assume C rather than LTO. +For example, SSA names are printed to the user in ``raw'' form, rather +than printing the underlying variable name. +@end itemize + +Some ideas for other checkers +@itemize @bullet +@item +File-descriptor-based APIs +@item +Linux kernel internal APIs +@item +Signal handling +@end itemize + +@node Debugging the Analyzer +@section Debugging the Analyzer +@cindex analyzer, debugging +@cindex static analyzer, debugging + +@subsection Special Functions for Debugging the Analyzer + +The analyzer recognizes various special functions by name, for use +in debugging the analyzer. Declarations can be seen in the testsuite +in @file{analyzer-decls.h}. None of these functions are actually +implemented. + +Add: +@smallexample + __analyzer_break (); +@end smallexample +to the source being analyzed to trigger a breakpoint in the analyzer when +that source is reached. By putting a series of these in the source, it's +much easier to effectively step through the program state as it's analyzed. + +The analyzer handles: + +@smallexample +__analyzer_describe (0, expr); +@end smallexample + +by emitting a warning describing the 2nd argument (which can be of any +type), at a verbosity level given by the 1st argument. This is for use when +debugging, and may be of use in DejaGnu tests. + +@smallexample +__analyzer_dump (); +@end smallexample + +will dump the copious information about the analyzer's state each time it +reaches the call in its traversal of the source. + +@smallexample +extern void __analyzer_dump_capacity (const void *ptr); +@end smallexample + +will emit a warning describing the capacity of the base region of +the region pointed to by the 1st argument. + +@smallexample +extern void __analyzer_dump_escaped (void); +@end smallexample + +will emit a warning giving the number of decls that have escaped on this +analysis path, followed by a comma-separated list of their names, +in alphabetical order. + +@smallexample +__analyzer_dump_path (); +@end smallexample + +will emit a placeholder ``note'' diagnostic with a path to that call site, +if the analyzer finds a feasible path to it. + +The builtin @code{__analyzer_dump_exploded_nodes} will emit a warning +after analysis containing information on all of the exploded nodes at that +program point: + +@smallexample + __analyzer_dump_exploded_nodes (0); +@end smallexample + +will output the number of ``processed'' nodes, and the IDs of +both ``processed'' and ``merger'' nodes, such as: + +@smallexample +warning: 2 processed enodes: [EN: 56, EN: 58] merger(s): [EN: 54-55, EN: 57, EN: 59] +@end smallexample + +With a non-zero argument + +@smallexample + __analyzer_dump_exploded_nodes (1); +@end smallexample + +it will also dump all of the states within the ``processed'' nodes. + +@smallexample + __analyzer_dump_region_model (); +@end smallexample +will dump the region_model's state to stderr. + +@smallexample +__analyzer_dump_state ("malloc", ptr); +@end smallexample + +will emit a warning describing the state of the 2nd argument +(which can be of any type) with respect to the state machine with +a name matching the 1st argument (which must be a string literal). +This is for use when debugging, and may be of use in DejaGnu tests. + +@smallexample +__analyzer_eval (expr); +@end smallexample +will emit a warning with text "TRUE", FALSE" or "UNKNOWN" based on the +truthfulness of the argument. This is useful for writing DejaGnu tests. + +@smallexample +__analyzer_get_unknown_ptr (); +@end smallexample +will obtain an unknown @code{void *}. + +@subsection Other Debugging Techniques + +The option @option{-fdump-analyzer-json} will dump both the supergraph +and the exploded graph in compressed JSON form. + +One approach when tracking down where a particular bogus state is +introduced into the @code{exploded_graph} is to add custom code to +@code{program_state::validate}. + +The debug function @code{region::is_named_decl_p} can be used when debugging, +such as for assertions and conditional breakpoints. For example, when +tracking down a bug in handling a decl called @code{yy_buffer_stack}, I +temporarily added a: +@smallexample + gcc_assert (!m_base_region->is_named_decl_p ("yy_buffer_stack")); +@end smallexample +to @code{binding_cluster::mark_as_escaped} to trap a point where +@code{yy_buffer_stack} was mistakenly being treated as having escaped. diff --git a/gcc/doc/avr-mmcu.texi b/gcc/doc/avr-mmcu.texi new file mode 100644 index 00000000000..c3e9817928a --- /dev/null +++ b/gcc/doc/avr-mmcu.texi @@ -0,0 +1,83 @@ +@c Copyright (C) 2012-2022 Free Software Foundation, Inc. +@c This is part of the GCC manual. +@c For copying conditions, see the file gcc/doc/include/fdl.texi. + +@c This file is generated automatically using +@c gcc/config/avr/gen-avr-mmcu-texi.cc from: +@c gcc/config/avr/avr-arch.h +@c gcc/config/avr/avr-devices.cc +@c gcc/config/avr/avr-mcus.def + +@c Please do not edit manually. + +@table @code + +@item avr2 +``Classic'' devices with up to 8@tie{}KiB of program memory. +@*@var{mcu}@tie{}= @code{attiny22}, @code{attiny26}, @code{at90s2313}, @code{at90s2323}, @code{at90s2333}, @code{at90s2343}, @code{at90s4414}, @code{at90s4433}, @code{at90s4434}, @code{at90c8534}, @code{at90s8515}, @code{at90s8535}. + +@item avr25 +``Classic'' devices with up to 8@tie{}KiB of program memory and with the @code{MOVW} instruction. +@*@var{mcu}@tie{}= @code{attiny13}, @code{attiny13a}, @code{attiny24}, @code{attiny24a}, @code{attiny25}, @code{attiny261}, @code{attiny261a}, @code{attiny2313}, @code{attiny2313a}, @code{attiny43u}, @code{attiny44}, @code{attiny44a}, @code{attiny45}, @code{attiny48}, @code{attiny441}, @code{attiny461}, @code{attiny461a}, @code{attiny4313}, @code{attiny84}, @code{attiny84a}, @code{attiny85}, @code{attiny87}, @code{attiny88}, @code{attiny828}, @code{attiny841}, @code{attiny861}, @code{attiny861a}, @code{ata5272}, @code{ata6616c}, @code{at86rf401}. + +@item avr3 +``Classic'' devices with 16@tie{}KiB up to 64@tie{}KiB of program memory. +@*@var{mcu}@tie{}= @code{at76c711}, @code{at43usb355}. + +@item avr31 +``Classic'' devices with 128@tie{}KiB of program memory. +@*@var{mcu}@tie{}= @code{atmega103}, @code{at43usb320}. + +@item avr35 +``Classic'' devices with 16@tie{}KiB up to 64@tie{}KiB of program memory and with the @code{MOVW} instruction. +@*@var{mcu}@tie{}= @code{attiny167}, @code{attiny1634}, @code{atmega8u2}, @code{atmega16u2}, @code{atmega32u2}, @code{ata5505}, @code{ata6617c}, @code{ata664251}, @code{at90usb82}, @code{at90usb162}. + +@item avr4 +``Enhanced'' devices with up to 8@tie{}KiB of program memory. +@*@var{mcu}@tie{}= @code{atmega48}, @code{atmega48a}, @code{atmega48p}, @code{atmega48pa}, @code{atmega48pb}, @code{atmega8}, @code{atmega8a}, @code{atmega8hva}, @code{atmega88}, @code{atmega88a}, @code{atmega88p}, @code{atmega88pa}, @code{atmega88pb}, @code{atmega8515}, @code{atmega8535}, @code{ata6285}, @code{ata6286}, @code{ata6289}, @code{ata6612c}, @code{at90pwm1}, @code{at90pwm2}, @code{at90pwm2b}, @code{at90pwm3}, @code{at90pwm3b}, @code{at90pwm81}. + +@item avr5 +``Enhanced'' devices with 16@tie{}KiB up to 64@tie{}KiB of program memory. +@*@var{mcu}@tie{}= @code{atmega16}, @code{atmega16a}, @code{atmega16hva}, @code{atmega16hva2}, @code{atmega16hvb}, @code{atmega16hvbrevb}, @code{atmega16m1}, @code{atmega16u4}, @code{atmega161}, @code{atmega162}, @code{atmega163}, @code{atmega164a}, @code{atmega164p}, @code{atmega164pa}, @code{atmega165}, @code{atmega165a}, @code{atmega165p}, @code{atmega165pa}, @code{atmega168}, @code{atmega168a}, @code{atmega168p}, @code{atmega168pa}, @code{atmega168pb}, @code{atmega169}, @code{atmega169a}, @code{atmega169p}, @code{atmega169pa}, @code{atmega32}, @code{atmega32a}, @code{atmega32c1}, @code{atmega32hvb}, @code{atmega32hvbrevb}, @code{atmega32m1}, @code{atmega32u4}, @code{atmega32u6}, @code{atmega323}, @code{atmega324a}, @code{atmega324p}, @code{atmega324pa}, @code{atmega324pb}, @code{atmega325}, @code{atmega325a}, @code{atmega325p}, @code{atmega325pa}, @code{atmega328}, @code{atmega328p}, @code{atmega328pb}, @code{atmega329}, @code{atmega329a}, @code{atmega329p}, @code{atmega329pa}, @code{atmega3250}, @code{atmega3250a}, @code{atmega3250p}, @code{atmega3250pa}, @code{atmega3290}, @code{atmega3290a}, @code{atmega3290p}, @code{atmega3290pa}, @code{atmega406}, @code{atmega64}, @code{atmega64a}, @code{atmega64c1}, @code{atmega64hve}, @code{atmega64hve2}, @code{atmega64m1}, @code{atmega64rfr2}, @code{atmega640}, @code{atmega644}, @code{atmega644a}, @code{atmega644p}, @code{atmega644pa}, @code{atmega644rfr2}, @code{atmega645}, @code{atmega645a}, @code{atmega645p}, @code{atmega649}, @code{atmega649a}, @code{atmega649p}, @code{atmega6450}, @code{atmega6450a}, @code{atmega6450p}, @code{atmega6490}, @code{atmega6490a}, @code{atmega6490p}, @code{ata5795}, @code{ata5790}, @code{ata5790n}, @code{ata5791}, @code{ata6613c}, @code{ata6614q}, @code{ata5782}, @code{ata5831}, @code{ata8210}, @code{ata8510}, @code{ata5702m322}, @code{at90pwm161}, @code{at90pwm216}, @code{at90pwm316}, @code{at90can32}, @code{at90can64}, @code{at90scr100}, @code{at90usb646}, @code{at90usb647}, @code{at94k}, @code{m3000}. + +@item avr51 +``Enhanced'' devices with 128@tie{}KiB of program memory. +@*@var{mcu}@tie{}= @code{atmega128}, @code{atmega128a}, @code{atmega128rfa1}, @code{atmega128rfr2}, @code{atmega1280}, @code{atmega1281}, @code{atmega1284}, @code{atmega1284p}, @code{atmega1284rfr2}, @code{at90can128}, @code{at90usb1286}, @code{at90usb1287}. + +@item avr6 +``Enhanced'' devices with 3-byte PC, i.e.@: with more than 128@tie{}KiB of program memory. +@*@var{mcu}@tie{}= @code{atmega256rfr2}, @code{atmega2560}, @code{atmega2561}, @code{atmega2564rfr2}. + +@item avrxmega2 +``XMEGA'' devices with more than 8@tie{}KiB and up to 64@tie{}KiB of program memory. +@*@var{mcu}@tie{}= @code{atxmega8e5}, @code{atxmega16a4}, @code{atxmega16a4u}, @code{atxmega16c4}, @code{atxmega16d4}, @code{atxmega16e5}, @code{atxmega32a4}, @code{atxmega32a4u}, @code{atxmega32c3}, @code{atxmega32c4}, @code{atxmega32d3}, @code{atxmega32d4}, @code{atxmega32e5}, @code{avr64da28}, @code{avr64da32}, @code{avr64da48}, @code{avr64da64}, @code{avr64db28}, @code{avr64db32}, @code{avr64db48}, @code{avr64db64}. + +@item avrxmega3 +``XMEGA'' devices with up to 64@tie{}KiB of combined program memory and RAM, and with program memory visible in the RAM address space. +@*@var{mcu}@tie{}= @code{attiny202}, @code{attiny204}, @code{attiny212}, @code{attiny214}, @code{attiny402}, @code{attiny404}, @code{attiny406}, @code{attiny412}, @code{attiny414}, @code{attiny416}, @code{attiny417}, @code{attiny804}, @code{attiny806}, @code{attiny807}, @code{attiny814}, @code{attiny816}, @code{attiny817}, @code{attiny1604}, @code{attiny1606}, @code{attiny1607}, @code{attiny1614}, @code{attiny1616}, @code{attiny1617}, @code{attiny3214}, @code{attiny3216}, @code{attiny3217}, @code{atmega808}, @code{atmega809}, @code{atmega1608}, @code{atmega1609}, @code{atmega3208}, @code{atmega3209}, @code{atmega4808}, @code{atmega4809}, @code{avr32da28}, @code{avr32da32}, @code{avr32da48}, @code{avr32db28}, @code{avr32db32}, @code{avr32db48}. + +@item avrxmega4 +``XMEGA'' devices with more than 64@tie{}KiB and up to 128@tie{}KiB of program memory. +@*@var{mcu}@tie{}= @code{atxmega64a3}, @code{atxmega64a3u}, @code{atxmega64a4u}, @code{atxmega64b1}, @code{atxmega64b3}, @code{atxmega64c3}, @code{atxmega64d3}, @code{atxmega64d4}, @code{avr128da28}, @code{avr128da32}, @code{avr128da48}, @code{avr128da64}, @code{avr128db28}, @code{avr128db32}, @code{avr128db48}, @code{avr128db64}. + +@item avrxmega5 +``XMEGA'' devices with more than 64@tie{}KiB and up to 128@tie{}KiB of program memory and more than 64@tie{}KiB of RAM. +@*@var{mcu}@tie{}= @code{atxmega64a1}, @code{atxmega64a1u}. + +@item avrxmega6 +``XMEGA'' devices with more than 128@tie{}KiB of program memory. +@*@var{mcu}@tie{}= @code{atxmega128a3}, @code{atxmega128a3u}, @code{atxmega128b1}, @code{atxmega128b3}, @code{atxmega128c3}, @code{atxmega128d3}, @code{atxmega128d4}, @code{atxmega192a3}, @code{atxmega192a3u}, @code{atxmega192c3}, @code{atxmega192d3}, @code{atxmega256a3}, @code{atxmega256a3b}, @code{atxmega256a3bu}, @code{atxmega256a3u}, @code{atxmega256c3}, @code{atxmega256d3}, @code{atxmega384c3}, @code{atxmega384d3}. + +@item avrxmega7 +``XMEGA'' devices with more than 128@tie{}KiB of program memory and more than 64@tie{}KiB of RAM. +@*@var{mcu}@tie{}= @code{atxmega128a1}, @code{atxmega128a1u}, @code{atxmega128a4u}. + +@item avrtiny +``TINY'' Tiny core devices with 512@tie{}B up to 4@tie{}KiB of program memory. +@*@var{mcu}@tie{}= @code{attiny4}, @code{attiny5}, @code{attiny9}, @code{attiny10}, @code{attiny20}, @code{attiny40}. + +@item avr1 +This ISA is implemented by the minimal AVR core and supported for assembler only. +@*@var{mcu}@tie{}= @code{attiny11}, @code{attiny12}, @code{attiny15}, @code{attiny28}, @code{at90s1200}. + +@end table diff --git a/gcc/doc/bugreport.texi b/gcc/doc/bugreport.texi new file mode 100644 index 00000000000..84246faecee --- /dev/null +++ b/gcc/doc/bugreport.texi @@ -0,0 +1,88 @@ +@c Copyright (C) 1988-2022 Free Software Foundation, Inc. +@c This is part of the GCC manual. +@c For copying conditions, see the file gcc.texi. + +@node Bugs +@chapter Reporting Bugs +@cindex bugs +@cindex reporting bugs + +Your bug reports play an essential role in making GCC reliable. + +When you encounter a problem, the first thing to do is to see if it is +already known. @xref{Trouble}. If it isn't known, then you should +report the problem. + +@menu +* Criteria: Bug Criteria. Have you really found a bug? +* Reporting: Bug Reporting. How to report a bug effectively. +@end menu + +@node Bug Criteria +@section Have You Found a Bug? +@cindex bug criteria + +If you are not sure whether you have found a bug, here are some guidelines: + +@itemize @bullet +@cindex fatal signal +@cindex core dump +@item +If the compiler gets a fatal signal, for any input whatever, that is a +compiler bug. Reliable compilers never crash. + +@cindex invalid assembly code +@cindex assembly code, invalid +@item +If the compiler produces invalid assembly code, for any input whatever +(except an @code{asm} statement), that is a compiler bug, unless the +compiler reports errors (not just warnings) which would ordinarily +prevent the assembler from being run. + +@cindex undefined behavior +@cindex undefined function value +@cindex increment operators +@item +If the compiler produces valid assembly code that does not correctly +execute the input source code, that is a compiler bug. + +However, you must double-check to make sure, because you may have a +program whose behavior is undefined, which happened by chance to give +the desired results with another C or C++ compiler. + +For example, in many nonoptimizing compilers, you can write @samp{x;} +at the end of a function instead of @samp{return x;}, with the same +results. But the value of the function is undefined if @code{return} +is omitted; it is not a bug when GCC produces different results. + +Problems often result from expressions with two increment operators, +as in @code{f (*p++, *p++)}. Your previous compiler might have +interpreted that expression the way you intended; GCC might +interpret it another way. Neither compiler is wrong. The bug is +in your code. + +After you have localized the error to a single source line, it should +be easy to check for these things. If your program is correct and +well defined, you have found a compiler bug. + +@item +If the compiler produces an error message for valid input, that is a +compiler bug. + +@cindex invalid input +@item +If the compiler does not produce an error message for invalid input, +that is a compiler bug. However, you should note that your idea of +``invalid input'' might be someone else's idea of ``an extension'' or +``support for traditional practice''. + +@item +If you are an experienced user of one of the languages GCC supports, your +suggestions for improvement of GCC are welcome in any case. +@end itemize + +@node Bug Reporting +@section How and Where to Report Bugs +@cindex compiler bugs, reporting + +Bugs should be reported to the bug database at @value{BUGURL}. diff --git a/gcc/doc/cfg.texi b/gcc/doc/cfg.texi new file mode 100644 index 00000000000..32aacdd0aa8 --- /dev/null +++ b/gcc/doc/cfg.texi @@ -0,0 +1,684 @@ +@c -*-texinfo-*- +@c Copyright (C) 2001-2022 Free Software Foundation, Inc. +@c This is part of the GCC manual. +@c For copying conditions, see the file gcc.texi. + +@c --------------------------------------------------------------------- +@c Control Flow Graph +@c --------------------------------------------------------------------- + +@node Control Flow +@chapter Control Flow Graph +@cindex CFG, Control Flow Graph +@findex basic-block.h + +A control flow graph (CFG) is a data structure built on top of the +intermediate code representation (the RTL or @code{GIMPLE} instruction +stream) abstracting the control flow behavior of a function that is +being compiled. The CFG is a directed graph where the vertices +represent basic blocks and edges represent possible transfer of +control flow from one basic block to another. The data structures +used to represent the control flow graph are defined in +@file{basic-block.h}. + +In GCC, the representation of control flow is maintained throughout +the compilation process, from constructing the CFG early in +@code{pass_build_cfg} to @code{pass_free_cfg} (see @file{passes.def}). +The CFG takes various different modes and may undergo extensive +manipulations, but the graph is always valid between its construction +and its release. This way, transfer of information such as data flow, +a measured profile, or the loop tree, can be propagated through the +passes pipeline, and even from @code{GIMPLE} to @code{RTL}. + +Often the CFG may be better viewed as integral part of instruction +chain, than structure built on the top of it. Updating the compiler's +intermediate representation for instructions cannot be easily done +without proper maintenance of the CFG simultaneously. + +@menu +* Basic Blocks:: The definition and representation of basic blocks. +* Edges:: Types of edges and their representation. +* Profile information:: Representation of frequencies and probabilities. +* Maintaining the CFG:: Keeping the control flow graph and up to date. +* Liveness information:: Using and maintaining liveness information. +@end menu + + +@node Basic Blocks +@section Basic Blocks + +@cindex basic block +@findex basic_block +A basic block is a straight-line sequence of code with only one entry +point and only one exit. In GCC, basic blocks are represented using +the @code{basic_block} data type. + +@findex ENTRY_BLOCK_PTR, EXIT_BLOCK_PTR +Special basic blocks represent possible entry and exit points of a +function. These blocks are called @code{ENTRY_BLOCK_PTR} and +@code{EXIT_BLOCK_PTR}. These blocks do not contain any code. + +@findex BASIC_BLOCK +The @code{BASIC_BLOCK} array contains all basic blocks in an +unspecified order. Each @code{basic_block} structure has a field +that holds a unique integer identifier @code{index} that is the +index of the block in the @code{BASIC_BLOCK} array. +The total number of basic blocks in the function is +@code{n_basic_blocks}. Both the basic block indices and +the total number of basic blocks may vary during the compilation +process, as passes reorder, create, duplicate, and destroy basic +blocks. The index for any block should never be greater than +@code{last_basic_block}. The indices 0 and 1 are special codes +reserved for @code{ENTRY_BLOCK} and @code{EXIT_BLOCK}, the +indices of @code{ENTRY_BLOCK_PTR} and @code{EXIT_BLOCK_PTR}. + +@findex next_bb, prev_bb, FOR_EACH_BB, FOR_ALL_BB +Two pointer members of the @code{basic_block} structure are the +pointers @code{next_bb} and @code{prev_bb}. These are used to keep +doubly linked chain of basic blocks in the same order as the +underlying instruction stream. The chain of basic blocks is updated +transparently by the provided API for manipulating the CFG@. The macro +@code{FOR_EACH_BB} can be used to visit all the basic blocks in +lexicographical order, except @code{ENTRY_BLOCK} and @code{EXIT_BLOCK}. +The macro @code{FOR_ALL_BB} also visits all basic blocks in +lexicographical order, including @code{ENTRY_BLOCK} and @code{EXIT_BLOCK}. + +@findex post_order_compute, inverted_post_order_compute, walk_dominator_tree +The functions @code{post_order_compute} and @code{inverted_post_order_compute} +can be used to compute topological orders of the CFG. The orders are +stored as vectors of basic block indices. The @code{BASIC_BLOCK} array +can be used to iterate each basic block by index. +Dominator traversals are also possible using +@code{walk_dominator_tree}. Given two basic blocks A and B, block A +dominates block B if A is @emph{always} executed before B@. + +Each @code{basic_block} also contains pointers to the first +instruction (the @dfn{head}) and the last instruction (the @dfn{tail}) +or @dfn{end} of the instruction stream contained in a basic block. In +fact, since the @code{basic_block} data type is used to represent +blocks in both major intermediate representations of GCC (@code{GIMPLE} +and RTL), there are pointers to the head and end of a basic block for +both representations, stored in intermediate representation specific +data in the @code{il} field of @code{struct basic_block_def}. + +@findex CODE_LABEL +@findex NOTE_INSN_BASIC_BLOCK +For RTL, these pointers are @code{BB_HEAD} and @code{BB_END}. + +@cindex insn notes, notes +@findex NOTE_INSN_BASIC_BLOCK +In the RTL representation of a function, the instruction stream +contains not only the ``real'' instructions, but also @dfn{notes} +or @dfn{insn notes} (to distinguish them from @dfn{reg notes}). +Any function that moves or duplicates the basic blocks needs +to take care of updating of these notes. Many of these notes expect +that the instruction stream consists of linear regions, so updating +can sometimes be tedious. All types of insn notes are defined +in @file{insn-notes.def}. + +In the RTL function representation, the instructions contained in a +basic block always follow a @code{NOTE_INSN_BASIC_BLOCK}, but zero +or more @code{CODE_LABEL} nodes can precede the block note. +A basic block ends with a control flow instruction or with the last +instruction before the next @code{CODE_LABEL} or +@code{NOTE_INSN_BASIC_BLOCK}. +By definition, a @code{CODE_LABEL} cannot appear in the middle of +the instruction stream of a basic block. + +@findex can_fallthru +@cindex table jump +In addition to notes, the jump table vectors are also represented as +``pseudo-instructions'' inside the insn stream. These vectors never +appear in the basic block and should always be placed just after the +table jump instructions referencing them. After removing the +table-jump it is often difficult to eliminate the code computing the +address and referencing the vector, so cleaning up these vectors is +postponed until after liveness analysis. Thus the jump table vectors +may appear in the insn stream unreferenced and without any purpose. +Before any edge is made @dfn{fall-thru}, the existence of such +construct in the way needs to be checked by calling +@code{can_fallthru} function. + +@cindex GIMPLE statement iterators +For the @code{GIMPLE} representation, the PHI nodes and statements +contained in a basic block are in a @code{gimple_seq} pointed to by +the basic block intermediate language specific pointers. +Abstract containers and iterators are used to access the PHI nodes +and statements in a basic blocks. These iterators are called +@dfn{GIMPLE statement iterators} (GSIs). Grep for @code{^gsi} +in the various @file{gimple-*} and @file{tree-*} files. +There is a @code{gimple_stmt_iterator} type for iterating over +all kinds of statement, and a @code{gphi_iterator} subclass for +iterating over PHI nodes. +The following snippet will pretty-print all PHI nodes the statements +of the current function in the GIMPLE representation. + +@smallexample +basic_block bb; + +FOR_EACH_BB (bb) + @{ + gphi_iterator pi; + gimple_stmt_iterator si; + + for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi)) + @{ + gphi *phi = pi.phi (); + print_gimple_stmt (dump_file, phi, 0, TDF_SLIM); + @} + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) + @{ + gimple stmt = gsi_stmt (si); + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + @} + @} +@end smallexample + + +@node Edges +@section Edges + +@cindex edge in the flow graph +@findex edge +Edges represent possible control flow transfers from the end of some +basic block A to the head of another basic block B@. We say that A is +a predecessor of B, and B is a successor of A@. Edges are represented +in GCC with the @code{edge} data type. Each @code{edge} acts as a +link between two basic blocks: The @code{src} member of an edge +points to the predecessor basic block of the @code{dest} basic block. +The members @code{preds} and @code{succs} of the @code{basic_block} data +type point to type-safe vectors of edges to the predecessors and +successors of the block. + +@cindex edge iterators +When walking the edges in an edge vector, @dfn{edge iterators} should +be used. Edge iterators are constructed using the +@code{edge_iterator} data structure and several methods are available +to operate on them: + +@ftable @code +@item ei_start +This function initializes an @code{edge_iterator} that points to the +first edge in a vector of edges. + +@item ei_last +This function initializes an @code{edge_iterator} that points to the +last edge in a vector of edges. + +@item ei_end_p +This predicate is @code{true} if an @code{edge_iterator} represents +the last edge in an edge vector. + +@item ei_one_before_end_p +This predicate is @code{true} if an @code{edge_iterator} represents +the second last edge in an edge vector. + +@item ei_next +This function takes a pointer to an @code{edge_iterator} and makes it +point to the next edge in the sequence. + +@item ei_prev +This function takes a pointer to an @code{edge_iterator} and makes it +point to the previous edge in the sequence. + +@item ei_edge +This function returns the @code{edge} currently pointed to by an +@code{edge_iterator}. + +@item ei_safe_edge +This function returns the @code{edge} currently pointed to by an +@code{edge_iterator}, but returns @code{NULL} if the iterator is +pointing at the end of the sequence. This function has been provided +for existing code makes the assumption that a @code{NULL} edge +indicates the end of the sequence. + +@end ftable + +The convenience macro @code{FOR_EACH_EDGE} can be used to visit all of +the edges in a sequence of predecessor or successor edges. It must +not be used when an element might be removed during the traversal, +otherwise elements will be missed. Here is an example of how to use +the macro: + +@smallexample +edge e; +edge_iterator ei; + +FOR_EACH_EDGE (e, ei, bb->succs) + @{ + if (e->flags & EDGE_FALLTHRU) + break; + @} +@end smallexample + +@findex fall-thru +There are various reasons why control flow may transfer from one block +to another. One possibility is that some instruction, for example a +@code{CODE_LABEL}, in a linearized instruction stream just always +starts a new basic block. In this case a @dfn{fall-thru} edge links +the basic block to the first following basic block. But there are +several other reasons why edges may be created. The @code{flags} +field of the @code{edge} data type is used to store information +about the type of edge we are dealing with. Each edge is of one of +the following types: + +@table @emph +@item jump +No type flags are set for edges corresponding to jump instructions. +These edges are used for unconditional or conditional jumps and in +RTL also for table jumps. They are the easiest to manipulate as they +may be freely redirected when the flow graph is not in SSA form. + +@item fall-thru +@findex EDGE_FALLTHRU, force_nonfallthru +Fall-thru edges are present in case where the basic block may continue +execution to the following one without branching. These edges have +the @code{EDGE_FALLTHRU} flag set. Unlike other types of edges, these +edges must come into the basic block immediately following in the +instruction stream. The function @code{force_nonfallthru} is +available to insert an unconditional jump in the case that redirection +is needed. Note that this may require creation of a new basic block. + +@item exception handling +@cindex exception handling +@findex EDGE_ABNORMAL, EDGE_EH +Exception handling edges represent possible control transfers from a +trapping instruction to an exception handler. The definition of +``trapping'' varies. In C++, only function calls can throw, but for +Ada exceptions like division by zero or segmentation fault are +defined and thus each instruction possibly throwing this kind of +exception needs to be handled as control flow instruction. Exception +edges have the @code{EDGE_ABNORMAL} and @code{EDGE_EH} flags set. + +@findex purge_dead_edges +When updating the instruction stream it is easy to change possibly +trapping instruction to non-trapping, by simply removing the exception +edge. The opposite conversion is difficult, but should not happen +anyway. The edges can be eliminated via @code{purge_dead_edges} call. + +@findex REG_EH_REGION, EDGE_ABNORMAL_CALL +In the RTL representation, the destination of an exception edge is +specified by @code{REG_EH_REGION} note attached to the insn. +In case of a trapping call the @code{EDGE_ABNORMAL_CALL} flag is set +too. In the @code{GIMPLE} representation, this extra flag is not set. + +@findex may_trap_p, tree_could_trap_p +In the RTL representation, the predicate @code{may_trap_p} may be used +to check whether instruction still may trap or not. For the tree +representation, the @code{tree_could_trap_p} predicate is available, +but this predicate only checks for possible memory traps, as in +dereferencing an invalid pointer location. + + +@item sibling calls +@cindex sibling call +@findex EDGE_ABNORMAL, EDGE_SIBCALL +Sibling calls or tail calls terminate the function in a non-standard +way and thus an edge to the exit must be present. +@code{EDGE_SIBCALL} and @code{EDGE_ABNORMAL} are set in such case. +These edges only exist in the RTL representation. + +@item computed jumps +@cindex computed jump +@findex EDGE_ABNORMAL +Computed jumps contain edges to all labels in the function referenced +from the code. All those edges have @code{EDGE_ABNORMAL} flag set. +The edges used to represent computed jumps often cause compile time +performance problems, since functions consisting of many taken labels +and many computed jumps may have @emph{very} dense flow graphs, so +these edges need to be handled with special care. During the earlier +stages of the compilation process, GCC tries to avoid such dense flow +graphs by factoring computed jumps. For example, given the following +series of jumps, + +@smallexample + goto *x; + [ @dots{} ] + + goto *x; + [ @dots{} ] + + goto *x; + [ @dots{} ] +@end smallexample + +@noindent +factoring the computed jumps results in the following code sequence +which has a much simpler flow graph: + +@smallexample + goto y; + [ @dots{} ] + + goto y; + [ @dots{} ] + + goto y; + [ @dots{} ] + +y: + goto *x; +@end smallexample + +@findex pass_duplicate_computed_gotos +However, the classic problem with this transformation is that it has a +runtime cost in there resulting code: An extra jump. Therefore, the +computed jumps are un-factored in the later passes of the compiler +(in the pass called @code{pass_duplicate_computed_gotos}). +Be aware of that when you work on passes in that area. There have +been numerous examples already where the compile time for code with +unfactored computed jumps caused some serious headaches. + +@item nonlocal goto handlers +@cindex nonlocal goto handler +@findex EDGE_ABNORMAL, EDGE_ABNORMAL_CALL +GCC allows nested functions to return into caller using a @code{goto} +to a label passed to as an argument to the callee. The labels passed +to nested functions contain special code to cleanup after function +call. Such sections of code are referred to as ``nonlocal goto +receivers''. If a function contains such nonlocal goto receivers, an +edge from the call to the label is created with the +@code{EDGE_ABNORMAL} and @code{EDGE_ABNORMAL_CALL} flags set. + +@item function entry points +@cindex function entry point, alternate function entry point +@findex LABEL_ALTERNATE_NAME +By definition, execution of function starts at basic block 0, so there +is always an edge from the @code{ENTRY_BLOCK_PTR} to basic block 0. +There is no @code{GIMPLE} representation for alternate entry points at +this moment. In RTL, alternate entry points are specified by +@code{CODE_LABEL} with @code{LABEL_ALTERNATE_NAME} defined. This +feature is currently used for multiple entry point prologues and is +limited to post-reload passes only. This can be used by back-ends to +emit alternate prologues for functions called from different contexts. +In future full support for multiple entry functions defined by Fortran +90 needs to be implemented. + +@item function exits +In the pre-reload representation a function terminates after the last +instruction in the insn chain and no explicit return instructions are +used. This corresponds to the fall-thru edge into exit block. After +reload, optimal RTL epilogues are used that use explicit (conditional) +return instructions that are represented by edges with no flags set. + +@end table + + +@node Profile information +@section Profile information + +@cindex profile representation +In many cases a compiler must make a choice whether to trade speed in +one part of code for speed in another, or to trade code size for code +speed. In such cases it is useful to know information about how often +some given block will be executed. That is the purpose for +maintaining profile within the flow graph. +GCC can handle profile information obtained through @dfn{profile +feedback}, but it can also estimate branch probabilities based on +statics and heuristics. + +@cindex profile feedback +The feedback based profile is produced by compiling the program with +instrumentation, executing it on a train run and reading the numbers +of executions of basic blocks and edges back to the compiler while +re-compiling the program to produce the final executable. This method +provides very accurate information about where a program spends most +of its time on the train run. Whether it matches the average run of +course depends on the choice of train data set, but several studies +have shown that the behavior of a program usually changes just +marginally over different data sets. + +@cindex Static profile estimation +@cindex branch prediction +@findex predict.def +When profile feedback is not available, the compiler may be asked to +attempt to predict the behavior of each branch in the program using a +set of heuristics (see @file{predict.def} for details) and compute +estimated frequencies of each basic block by propagating the +probabilities over the graph. + +@findex frequency, count, BB_FREQ_BASE +Each @code{basic_block} contains two integer fields to represent +profile information: @code{frequency} and @code{count}. The +@code{frequency} is an estimation how often is basic block executed +within a function. It is represented as an integer scaled in the +range from 0 to @code{BB_FREQ_BASE}. The most frequently executed +basic block in function is initially set to @code{BB_FREQ_BASE} and +the rest of frequencies are scaled accordingly. During optimization, +the frequency of the most frequent basic block can both decrease (for +instance by loop unrolling) or grow (for instance by cross-jumping +optimization), so scaling sometimes has to be performed multiple +times. + +@findex gcov_type +The @code{count} contains hard-counted numbers of execution measured +during training runs and is nonzero only when profile feedback is +available. This value is represented as the host's widest integer +(typically a 64 bit integer) of the special type @code{gcov_type}. + +Most optimization passes can use only the frequency information of a +basic block, but a few passes may want to know hard execution counts. +The frequencies should always match the counts after scaling, however +during updating of the profile information numerical error may +accumulate into quite large errors. + +@findex REG_BR_PROB_BASE, EDGE_FREQUENCY +Each edge also contains a branch probability field: an integer in the +range from 0 to @code{REG_BR_PROB_BASE}. It represents probability of +passing control from the end of the @code{src} basic block to the +@code{dest} basic block, i.e.@: the probability that control will flow +along this edge. The @code{EDGE_FREQUENCY} macro is available to +compute how frequently a given edge is taken. There is a @code{count} +field for each edge as well, representing same information as for a +basic block. + +The basic block frequencies are not represented in the instruction +stream, but in the RTL representation the edge frequencies are +represented for conditional jumps (via the @code{REG_BR_PROB} +macro) since they are used when instructions are output to the +assembly file and the flow graph is no longer maintained. + +@cindex reverse probability +The probability that control flow arrives via a given edge to its +destination basic block is called @dfn{reverse probability} and is not +directly represented, but it may be easily computed from frequencies +of basic blocks. + +@findex redirect_edge_and_branch +Updating profile information is a delicate task that can unfortunately +not be easily integrated with the CFG manipulation API@. Many of the +functions and hooks to modify the CFG, such as +@code{redirect_edge_and_branch}, do not have enough information to +easily update the profile, so updating it is in the majority of cases +left up to the caller. It is difficult to uncover bugs in the profile +updating code, because they manifest themselves only by producing +worse code, and checking profile consistency is not possible because +of numeric error accumulation. Hence special attention needs to be +given to this issue in each pass that modifies the CFG@. + +@findex REG_BR_PROB_BASE, BB_FREQ_BASE, count +It is important to point out that @code{REG_BR_PROB_BASE} and +@code{BB_FREQ_BASE} are both set low enough to be possible to compute +second power of any frequency or probability in the flow graph, it is +not possible to even square the @code{count} field, as modern CPUs are +fast enough to execute $2^32$ operations quickly. + + +@node Maintaining the CFG +@section Maintaining the CFG +@findex cfghooks.h + +An important task of each compiler pass is to keep both the control +flow graph and all profile information up-to-date. Reconstruction of +the control flow graph after each pass is not an option, since it may be +very expensive and lost profile information cannot be reconstructed at +all. + +GCC has two major intermediate representations, and both use the +@code{basic_block} and @code{edge} data types to represent control +flow. Both representations share as much of the CFG maintenance code +as possible. For each representation, a set of @dfn{hooks} is defined +so that each representation can provide its own implementation of CFG +manipulation routines when necessary. These hooks are defined in +@file{cfghooks.h}. There are hooks for almost all common CFG +manipulations, including block splitting and merging, edge redirection +and creating and deleting basic blocks. These hooks should provide +everything you need to maintain and manipulate the CFG in both the RTL +and @code{GIMPLE} representation. + +At the moment, the basic block boundaries are maintained transparently +when modifying instructions, so there rarely is a need to move them +manually (such as in case someone wants to output instruction outside +basic block explicitly). + +@findex BLOCK_FOR_INSN, gimple_bb +In the RTL representation, each instruction has a +@code{BLOCK_FOR_INSN} value that represents pointer to the basic block +that contains the instruction. In the @code{GIMPLE} representation, the +function @code{gimple_bb} returns a pointer to the basic block +containing the queried statement. + +@cindex GIMPLE statement iterators +When changes need to be applied to a function in its @code{GIMPLE} +representation, @dfn{GIMPLE statement iterators} should be used. These +iterators provide an integrated abstraction of the flow graph and the +instruction stream. Block statement iterators are constructed using +the @code{gimple_stmt_iterator} data structure and several modifiers are +available, including the following: + +@ftable @code +@item gsi_start +This function initializes a @code{gimple_stmt_iterator} that points to +the first non-empty statement in a basic block. + +@item gsi_last +This function initializes a @code{gimple_stmt_iterator} that points to +the last statement in a basic block. + +@item gsi_end_p +This predicate is @code{true} if a @code{gimple_stmt_iterator} +represents the end of a basic block. + +@item gsi_next +This function takes a @code{gimple_stmt_iterator} and makes it point to +its successor. + +@item gsi_prev +This function takes a @code{gimple_stmt_iterator} and makes it point to +its predecessor. + +@item gsi_insert_after +This function inserts a statement after the @code{gimple_stmt_iterator} +passed in. The final parameter determines whether the statement +iterator is updated to point to the newly inserted statement, or left +pointing to the original statement. + +@item gsi_insert_before +This function inserts a statement before the @code{gimple_stmt_iterator} +passed in. The final parameter determines whether the statement +iterator is updated to point to the newly inserted statement, or left +pointing to the original statement. + +@item gsi_remove +This function removes the @code{gimple_stmt_iterator} passed in and +rechains the remaining statements in a basic block, if any. +@end ftable + +@findex BB_HEAD, BB_END +In the RTL representation, the macros @code{BB_HEAD} and @code{BB_END} +may be used to get the head and end @code{rtx} of a basic block. No +abstract iterators are defined for traversing the insn chain, but you +can just use @code{NEXT_INSN} and @code{PREV_INSN} instead. @xref{Insns}. + +@findex purge_dead_edges +Usually a code manipulating pass simplifies the instruction stream and +the flow of control, possibly eliminating some edges. This may for +example happen when a conditional jump is replaced with an +unconditional jump. Updating of edges +is not transparent and each optimization pass is required to do so +manually. However only few cases occur in practice. The pass may +call @code{purge_dead_edges} on a given basic block to remove +superfluous edges, if any. + +@findex redirect_edge_and_branch, redirect_jump +Another common scenario is redirection of branch instructions, but +this is best modeled as redirection of edges in the control flow graph +and thus use of @code{redirect_edge_and_branch} is preferred over more +low level functions, such as @code{redirect_jump} that operate on RTL +chain only. The CFG hooks defined in @file{cfghooks.h} should provide +the complete API required for manipulating and maintaining the CFG@. + +@findex split_block +It is also possible that a pass has to insert control flow instruction +into the middle of a basic block, thus creating an entry point in the +middle of the basic block, which is impossible by definition: The +block must be split to make sure it only has one entry point, i.e.@: the +head of the basic block. The CFG hook @code{split_block} may be used +when an instruction in the middle of a basic block has to become the +target of a jump or branch instruction. + +@findex insert_insn_on_edge +@findex commit_edge_insertions +@findex gsi_insert_on_edge +@findex gsi_commit_edge_inserts +@cindex edge splitting +For a global optimizer, a common operation is to split edges in the +flow graph and insert instructions on them. In the RTL +representation, this can be easily done using the +@code{insert_insn_on_edge} function that emits an instruction +``on the edge'', caching it for a later @code{commit_edge_insertions} +call that will take care of moving the inserted instructions off the +edge into the instruction stream contained in a basic block. This +includes the creation of new basic blocks where needed. In the +@code{GIMPLE} representation, the equivalent functions are +@code{gsi_insert_on_edge} which inserts a block statement +iterator on an edge, and @code{gsi_commit_edge_inserts} which flushes +the instruction to actual instruction stream. + +@findex verify_flow_info +@cindex CFG verification +While debugging the optimization pass, the @code{verify_flow_info} +function may be useful to find bugs in the control flow graph updating +code. + + +@node Liveness information +@section Liveness information +@cindex Liveness representation +Liveness information is useful to determine whether some register is +``live'' at given point of program, i.e.@: that it contains a value that +may be used at a later point in the program. This information is +used, for instance, during register allocation, as the pseudo +registers only need to be assigned to a unique hard register or to a +stack slot if they are live. The hard registers and stack slots may +be freely reused for other values when a register is dead. + +Liveness information is available in the back end starting with +@code{pass_df_initialize} and ending with @code{pass_df_finish}. Three +flavors of live analysis are available: With @code{LR}, it is possible +to determine at any point @code{P} in the function if the register may be +used on some path from @code{P} to the end of the function. With +@code{UR}, it is possible to determine if there is a path from the +beginning of the function to @code{P} that defines the variable. +@code{LIVE} is the intersection of the @code{LR} and @code{UR} and a +variable is live at @code{P} if there is both an assignment that reaches +it from the beginning of the function and a use that can be reached on +some path from @code{P} to the end of the function. + +In general @code{LIVE} is the most useful of the three. The macros +@code{DF_[LR,UR,LIVE]_[IN,OUT]} can be used to access this information. +The macros take a basic block number and return a bitmap that is indexed +by the register number. This information is only guaranteed to be up to +date after calls are made to @code{df_analyze}. See the file +@code{df-core.cc} for details on using the dataflow. + + +@findex REG_DEAD, REG_UNUSED +The liveness information is stored partly in the RTL instruction stream +and partly in the flow graph. Local information is stored in the +instruction stream: Each instruction may contain @code{REG_DEAD} notes +representing that the value of a given register is no longer needed, or +@code{REG_UNUSED} notes representing that the value computed by the +instruction is never used. The second is useful for instructions +computing multiple values at once. + diff --git a/gcc/doc/collect2.texi b/gcc/doc/collect2.texi new file mode 100644 index 00000000000..8155b7906c9 --- /dev/null +++ b/gcc/doc/collect2.texi @@ -0,0 +1,89 @@ +@c Copyright (C) 1988-2022 Free Software Foundation, Inc. +@c This is part of the GCC manual. +@c For copying conditions, see the file gcc.texi. + +@node Collect2 +@chapter @code{collect2} + +GCC uses a utility called @code{collect2} on nearly all systems to arrange +to call various initialization functions at start time. + +The program @code{collect2} works by linking the program once and +looking through the linker output file for symbols with particular names +indicating they are constructor functions. If it finds any, it +creates a new temporary @samp{.c} file containing a table of them, +compiles it, and links the program a second time including that file. + +@findex __main +@cindex constructors, automatic calls +The actual calls to the constructors are carried out by a subroutine +called @code{__main}, which is called (automatically) at the beginning +of the body of @code{main} (provided @code{main} was compiled with GNU +CC)@. Calling @code{__main} is necessary, even when compiling C code, to +allow linking C and C++ object code together. (If you use +@option{-nostdlib}, you get an unresolved reference to @code{__main}, +since it's defined in the standard GCC library. Include @option{-lgcc} at +the end of your compiler command line to resolve this reference.) + +The program @code{collect2} is installed as @code{ld} in the directory +where the passes of the compiler are installed. When @code{collect2} +needs to find the @emph{real} @code{ld}, it tries the following file +names: + +@itemize @bullet +@item +a hard coded linker file name, if GCC was configured with the +@option{--with-ld} option. + +@item +@file{real-ld} in the directories listed in the compiler's search +directories. + +@item +@file{real-ld} in the directories listed in the environment variable +@code{PATH}. + +@item +The file specified in the @code{REAL_LD_FILE_NAME} configuration macro, +if specified. + +@item +@file{ld} in the compiler's search directories, except that +@code{collect2} will not execute itself recursively. + +@item +@file{ld} in @code{PATH}. +@end itemize + +``The compiler's search directories'' means all the directories where +@command{gcc} searches for passes of the compiler. This includes +directories that you specify with @option{-B}. + +Cross-compilers search a little differently: + +@itemize @bullet +@item +@file{real-ld} in the compiler's search directories. + +@item +@file{@var{target}-real-ld} in @code{PATH}. + +@item +The file specified in the @code{REAL_LD_FILE_NAME} configuration macro, +if specified. + +@item +@file{ld} in the compiler's search directories. + +@item +@file{@var{target}-ld} in @code{PATH}. +@end itemize + +@code{collect2} explicitly avoids running @code{ld} using the file name +under which @code{collect2} itself was invoked. In fact, it remembers +up a list of such names---in case one copy of @code{collect2} finds +another copy (or version) of @code{collect2} installed as @code{ld} in a +second place in the search path. + +@code{collect2} searches for the utilities @code{nm} and @code{strip} +using the same algorithm as above for @code{ld}. diff --git a/gcc/doc/compat.texi b/gcc/doc/compat.texi new file mode 100644 index 00000000000..ae265fa01de --- /dev/null +++ b/gcc/doc/compat.texi @@ -0,0 +1,156 @@ +@c Copyright (C) 2002-2022 Free Software Foundation, Inc. +@c This is part of the GCC manual. +@c For copying conditions, see the file gcc.texi. + +@node Compatibility +@chapter Binary Compatibility +@cindex binary compatibility +@cindex ABI +@cindex application binary interface + +Binary compatibility encompasses several related concepts: + +@table @dfn +@item application binary interface (ABI) +The set of runtime conventions followed by all of the tools that deal +with binary representations of a program, including compilers, assemblers, +linkers, and language runtime support. +Some ABIs are formal with a written specification, possibly designed +by multiple interested parties. Others are simply the way things are +actually done by a particular set of tools. + +@item ABI conformance +A compiler conforms to an ABI if it generates code that follows all of +the specifications enumerated by that ABI@. +A library conforms to an ABI if it is implemented according to that ABI@. +An application conforms to an ABI if it is built using tools that conform +to that ABI and does not contain source code that specifically changes +behavior specified by the ABI@. + +@item calling conventions +Calling conventions are a subset of an ABI that specify of how arguments +are passed and function results are returned. + +@item interoperability +Different sets of tools are interoperable if they generate files that +can be used in the same program. The set of tools includes compilers, +assemblers, linkers, libraries, header files, startup files, and debuggers. +Binaries produced by different sets of tools are not interoperable unless +they implement the same ABI@. This applies to different versions of the +same tools as well as tools from different vendors. + +@item intercallability +Whether a function in a binary built by one set of tools can call a +function in a binary built by a different set of tools is a subset +of interoperability. + +@item implementation-defined features +Language standards include lists of implementation-defined features whose +behavior can vary from one implementation to another. Some of these +features are normally covered by a platform's ABI and others are not. +The features that are not covered by an ABI generally affect how a +program behaves, but not intercallability. + +@item compatibility +Conformance to the same ABI and the same behavior of implementation-defined +features are both relevant for compatibility. +@end table + +The application binary interface implemented by a C or C++ compiler +affects code generation and runtime support for: + +@itemize @bullet +@item +size and alignment of data types +@item +layout of structured types +@item +calling conventions +@item +register usage conventions +@item +interfaces for runtime arithmetic support +@item +object file formats +@end itemize + +In addition, the application binary interface implemented by a C++ compiler +affects code generation and runtime support for: +@itemize @bullet +@item +name mangling +@item +exception handling +@item +invoking constructors and destructors +@item +layout, alignment, and padding of classes +@item +layout and alignment of virtual tables +@end itemize + +Some GCC compilation options cause the compiler to generate code that +does not conform to the platform's default ABI@. Other options cause +different program behavior for implementation-defined features that are +not covered by an ABI@. These options are provided for consistency with +other compilers that do not follow the platform's default ABI or the +usual behavior of implementation-defined features for the platform. +Be very careful about using such options. + +Most platforms have a well-defined ABI that covers C code, but ABIs +that cover C++ functionality are not yet common. + +Starting with GCC 3.2, GCC binary conventions for C++ are based on a +written, vendor-neutral C++ ABI that was designed to be specific to +64-bit Itanium but also includes generic specifications that apply to +any platform. +This C++ ABI is also implemented by other compiler vendors on some +platforms, notably GNU/Linux and BSD systems. +We have tried hard to provide a stable ABI that will be compatible with +future GCC releases, but it is possible that we will encounter problems +that make this difficult. Such problems could include different +interpretations of the C++ ABI by different vendors, bugs in the ABI, or +bugs in the implementation of the ABI in different compilers. +GCC's @option{-Wabi} switch warns when G++ generates code that is +probably not compatible with the C++ ABI@. + +The C++ library used with a C++ compiler includes the Standard C++ +Library, with functionality defined in the C++ Standard, plus language +runtime support. The runtime support is included in a C++ ABI, but there +is no formal ABI for the Standard C++ Library. Two implementations +of that library are interoperable if one follows the de-facto ABI of the +other and if they are both built with the same compiler, or with compilers +that conform to the same ABI for C++ compiler and runtime support. + +When G++ and another C++ compiler conform to the same C++ ABI, but the +implementations of the Standard C++ Library that they normally use do not +follow the same ABI for the Standard C++ Library, object files built with +those compilers can be used in the same program only if they use the same +C++ library. This requires specifying the location of the C++ library +header files when invoking the compiler whose usual library is not being +used. The location of GCC's C++ header files depends on how the GCC +build was configured, but can be seen by using the G++ @option{-v} option. +With default configuration options for G++ 3.3 the compile line for a +different C++ compiler needs to include + +@smallexample + -I@var{gcc_install_directory}/include/c++/3.3 +@end smallexample + +Similarly, compiling code with G++ that must use a C++ library other +than the GNU C++ library requires specifying the location of the header +files for that other library. + +The most straightforward way to link a program to use a particular +C++ library is to use a C++ driver that specifies that C++ library by +default. The @command{g++} driver, for example, tells the linker where +to find GCC's C++ library (@file{libstdc++}) plus the other libraries +and startup files it needs, in the proper order. + +If a program must use a different C++ library and it's not possible +to do the final link using a C++ driver that uses that library by default, +it is necessary to tell @command{g++} the location and name of that +library. It might also be necessary to specify different startup files +and other runtime support libraries, and to suppress the use of GCC's +support libraries with one or more of the options @option{-nostdlib}, +@option{-nostartfiles}, and @option{-nodefaultlibs}. diff --git a/gcc/doc/configfiles.texi b/gcc/doc/configfiles.texi new file mode 100644 index 00000000000..76f69559ec3 --- /dev/null +++ b/gcc/doc/configfiles.texi @@ -0,0 +1,69 @@ +@c Copyright (C) 1988-2022 Free Software Foundation, Inc. +@c This is part of the GCC manual. +@c For copying conditions, see the file gcc.texi. + +@node Configuration Files +@subsubsection Files Created by @code{configure} + +Here we spell out what files will be set up by @file{configure} in the +@file{gcc} directory. Some other files are created as temporary files +in the configuration process, and are not used in the subsequent +build; these are not documented. + +@itemize @bullet +@item +@file{Makefile} is constructed from @file{Makefile.in}, together with +the host and target fragments (@pxref{Fragments, , Makefile +Fragments}) @file{t-@var{target}} and @file{x-@var{host}} from +@file{config}, if any, and language Makefile fragments +@file{@var{language}/Make-lang.in}. +@item +@file{auto-host.h} contains information about the host machine +determined by @file{configure}. If the host machine is different from +the build machine, then @file{auto-build.h} is also created, +containing such information about the build machine. +@item +@file{config.status} is a script that may be run to recreate the +current configuration. +@item +@file{configargs.h} is a header containing details of the arguments +passed to @file{configure} to configure GCC, and of the thread model +used. +@item +@file{cstamp-h} is used as a timestamp. +@item +If a language @file{config-lang.in} file (@pxref{Front End Config, , +The Front End @file{config-lang.in} File}) sets @code{outputs}, then +the files listed in @code{outputs} there are also generated. +@end itemize + +The following configuration headers are created from the Makefile, +using @file{mkconfig.sh}, rather than directly by @file{configure}. +@file{config.h}, @file{bconfig.h} and @file{tconfig.h} all contain the +@file{xm-@var{machine}.h} header, if any, appropriate to the host, +build and target machines respectively, the configuration headers for +the target, and some definitions; for the host and build machines, +these include the autoconfigured headers generated by +@file{configure}. The other configuration headers are determined by +@file{config.gcc}. They also contain the typedefs for @code{rtx}, +@code{rtvec} and @code{tree}. + +@itemize @bullet +@item +@file{config.h}, for use in programs that run on the host machine. +@item +@file{bconfig.h}, for use in programs that run on the build machine. +@item +@file{tconfig.h}, for use in programs and libraries for the target +machine. +@item +@file{tm_p.h}, which includes the header @file{@var{machine}-protos.h} +that contains prototypes for functions in the target +@file{@var{machine}.c} file. The +@file{@var{machine}-protos.h} header is included after the @file{rtl.h} +and/or @file{tree.h} would have been included. +The @file{tm_p.h} also +includes the header @file{tm-preds.h} which is generated by +@file{genpreds} program during the build to define the declarations +and inline functions for the predicate functions. +@end itemize diff --git a/gcc/doc/configterms.texi b/gcc/doc/configterms.texi new file mode 100644 index 00000000000..b53655b4316 --- /dev/null +++ b/gcc/doc/configterms.texi @@ -0,0 +1,61 @@ +@c Copyright (C) 2001-2022 Free Software Foundation, Inc. +@c This is part of the GCC manual. +@c For copying conditions, see the file gcc.texi. + +@node Configure Terms +@section Configure Terms and History +@cindex configure terms +@cindex canadian + +The configure and build process has a long and colorful history, and can +be confusing to anyone who doesn't know why things are the way they are. +While there are other documents which describe the configuration process +in detail, here are a few things that everyone working on GCC should +know. + +There are three system names that the build knows about: the machine you +are building on (@dfn{build}), the machine that you are building for +(@dfn{host}), and the machine that GCC will produce code for +(@dfn{target}). When you configure GCC, you specify these with +@option{--build=}, @option{--host=}, and @option{--target=}. + +Specifying the host without specifying the build should be avoided, as +@command{configure} may (and once did) assume that the host you specify +is also the build, which may not be true. + +If build, host, and target are all the same, this is called a +@dfn{native}. If build and host are the same but target is different, +this is called a @dfn{cross}. If build, host, and target are all +different this is called a @dfn{canadian} (for obscure reasons dealing +with Canada's political party and the background of the person working +on the build at that time). If host and target are the same, but build +is different, you are using a cross-compiler to build a native for a +different system. Some people call this a @dfn{host-x-host}, +@dfn{crossed native}, or @dfn{cross-built native}. If build and target +are the same, but host is different, you are using a cross compiler to +build a cross compiler that produces code for the machine you're +building on. This is rare, so there is no common way of describing it. +There is a proposal to call this a @dfn{crossback}. + +If build and host are the same, the GCC you are building will also be +used to build the target libraries (like @code{libstdc++}). If build and host +are different, you must have already built and installed a cross +compiler that will be used to build the target libraries (if you +configured with @option{--target=foo-bar}, this compiler will be called +@command{foo-bar-gcc}). + +In the case of target libraries, the machine you're building for is the +machine you specified with @option{--target}. So, build is the machine +you're building on (no change there), host is the machine you're +building for (the target libraries are built for the target, so host is +the target you specified), and target doesn't apply (because you're not +building a compiler, you're building libraries). The configure/make +process will adjust these variables as needed. It also sets +@code{$with_cross_host} to the original @option{--host} value in case you +need it. + +The @code{libiberty} support library is built up to three times: once +for the host, once for the target (even if they are the same), and once +for the build if build and host are different. This allows it to be +used by all programs which are generated in the course of the build +process. diff --git a/gcc/doc/contrib.texi b/gcc/doc/contrib.texi new file mode 100644 index 00000000000..e14cf5e4751 --- /dev/null +++ b/gcc/doc/contrib.texi @@ -0,0 +1,1776 @@ +@c Copyright (C) 1988-2022 Free Software Foundation, Inc. +@c This is part of the GCC manual. +@c For copying conditions, see the file gcc.texi. + +@node Contributors +@unnumbered Contributors to GCC +@cindex contributors + +The GCC project would like to thank its many contributors. Without them the +project would not have been nearly as successful as it has been. Any omissions +in this list are accidental. Feel free to contact +@email{law@@redhat.com} or @email{gerald@@pfeifer.com} if you have been left +out or some of your contributions are not listed. Please keep this list in +alphabetical order. + +@itemize @bullet + +@item +Analog Devices helped implement the support for complex data types +and iterators. + +@item +John David Anglin for threading-related fixes and improvements to +libstdc++-v3, and the HP-UX port. + +@item +James van Artsdalen wrote the code that makes efficient use of +the Intel 80387 register stack. + +@item +Abramo and Roberto Bagnara for the SysV68 Motorola 3300 Delta Series +port. + +@item +Alasdair Baird for various bug fixes. + +@item +Giovanni Bajo for analyzing lots of complicated C++ problem reports. + +@item +Peter Barada for his work to improve code generation for new +ColdFire cores. + +@item +Gerald Baumgartner added the signature extension to the C++ front end. + +@item +Godmar Back for his Java improvements and encouragement. + +@item +Scott Bambrough for help porting the Java compiler. + +@item +Wolfgang Bangerth for processing tons of bug reports. + +@item +Jon Beniston for his Microsoft Windows port of Java and port to Lattice Mico32. + +@item +Daniel Berlin for better DWARF 2 support, faster/better optimizations, +improved alias analysis, plus migrating GCC to Bugzilla. + +@item +Geoff Berry for his Java object serialization work and various patches. + +@item +David Binderman tests weekly snapshots of GCC trunk against Fedora Rawhide +for several architectures. + +@item +Laurynas Biveinis for memory management work and DJGPP port fixes. + +@item +Uros Bizjak for the implementation of x87 math built-in functions and +for various middle end and i386 back end improvements and bug fixes. + +@item +Eric Blake for helping to make GCJ and libgcj conform to the +specifications. + +@item +Janne Blomqvist for contributions to GNU Fortran. + +@item +Hans-J. Boehm for his garbage collector, IA-64 libffi port, and other +Java work. + +@item +Segher Boessenkool for helping maintain the PowerPC port and the +instruction combiner plus various contributions to the middle end. + +@item +Neil Booth for work on cpplib, lang hooks, debug hooks and other +miscellaneous clean-ups. + +@item +Steven Bosscher for integrating the GNU Fortran front end into GCC and for +contributing to the tree-ssa branch. + +@item +Eric Botcazou for fixing middle- and backend bugs left and right. + +@item +Per Bothner for his direction via the steering committee and various +improvements to the infrastructure for supporting new languages. Chill +front end implementation. Initial implementations of +cpplib, fix-header, config.guess, libio, and past C++ library (libg++) +maintainer. Dreaming up, designing and implementing much of GCJ@. + +@item +Devon Bowen helped port GCC to the Tahoe. + +@item +Don Bowman for mips-vxworks contributions. + +@item +James Bowman for the FT32 port. + +@item +Dave Brolley for work on cpplib and Chill. + +@item +Paul Brook for work on the ARM architecture and maintaining GNU Fortran. + +@item +Robert Brown implemented the support for Encore 32000 systems. + +@item +Christian Bruel for improvements to local store elimination. + +@item +Herman A.J. ten Brugge for various fixes. + +@item +Joerg Brunsmann for Java compiler hacking and help with the GCJ FAQ@. + +@item +Joe Buck for his direction via the steering committee from its creation +to 2013. + +@item +Iain Buclaw for the D frontend. + +@item +Craig Burley for leadership of the G77 Fortran effort. + +@item +Tobias Burnus for contributions to GNU Fortran. + +@item +Stephan Buys for contributing Doxygen notes for libstdc++. + +@item +Paolo Carlini for libstdc++ work: lots of efficiency improvements to +the C++ strings, streambufs and formatted I/O, hard detective work on +the frustrating localization issues, and keeping up with the problem reports. + +@item +John Carr for his alias work, SPARC hacking, infrastructure improvements, +previous contributions to the steering committee, loop optimizations, etc. + +@item +Stephane Carrez for 68HC11 and 68HC12 ports. + +@item +Steve Chamberlain for support for the Renesas SH and H8 processors +and the PicoJava processor, and for GCJ config fixes. + +@item +Glenn Chambers for help with the GCJ FAQ@. + +@item +John-Marc Chandonia for various libgcj patches. + +@item +Denis Chertykov for contributing and maintaining the AVR port, the first GCC port +for an 8-bit architecture. + +@item +Kito Cheng for his work on the RISC-V port, including bringing up the test +suite and maintenance. + +@item +Scott Christley for his Objective-C contributions. + +@item +Eric Christopher for his Java porting help and clean-ups. + +@item +Branko Cibej for more warning contributions. + +@item +The @uref{https://www.gnu.org/software/classpath/,,GNU Classpath project} +for all of their merged runtime code. + +@item +Nick Clifton for arm, mcore, fr30, v850, m32r, msp430 rx work, +@option{--help}, and other random hacking. + +@item +Michael Cook for libstdc++ cleanup patches to reduce warnings. + +@item +R. Kelley Cook for making GCC buildable from a read-only directory as +well as other miscellaneous build process and documentation clean-ups. + +@item +Ralf Corsepius for SH testing and minor bug fixing. + +@item +Fran@,{c}ois-Xavier Coudert for contributions to GNU Fortran. + +@item +Stan Cox for care and feeding of the x86 port and lots of behind +the scenes hacking. + +@item +Alex Crain provided changes for the 3b1. + +@item +Ian Dall for major improvements to the NS32k port. + +@item +Paul Dale for his work to add uClinux platform support to the +m68k backend. + +@item +Palmer Dabbelt for his work maintaining the RISC-V port. + +@item +Dario Dariol contributed the four varieties of sample programs +that print a copy of their source. + +@item +Russell Davidson for fstream and stringstream fixes in libstdc++. + +@item +Bud Davis for work on the G77 and GNU Fortran compilers. + +@item +Mo DeJong for GCJ and libgcj bug fixes. + +@item +Jerry DeLisle for contributions to GNU Fortran. + +@item +DJ Delorie for the DJGPP port, build and libiberty maintenance, +various bug fixes, and the M32C, MeP, MSP430, and RL78 ports. + +@item +Arnaud Desitter for helping to debug GNU Fortran. + +@item +Gabriel Dos Reis for contributions to G++, contributions and +maintenance of GCC diagnostics infrastructure, libstdc++-v3, +including @code{valarray<>}, @code{complex<>}, maintaining the numerics library +(including that pesky @code{} :-) and keeping up-to-date anything +to do with numbers. + +@item +Ulrich Drepper for his work on glibc, testing of GCC using glibc, ISO C99 +support, CFG dumping support, etc., plus support of the C++ runtime +libraries including for all kinds of C interface issues, contributing and +maintaining @code{complex<>}, sanity checking and disbursement, configuration +architecture, libio maintenance, and early math work. + +@item +Fran@,{c}ois Dumont for his work on libstdc++-v3, especially maintaining and +improving @code{debug-mode} and associative and unordered containers. + +@item +Zdenek Dvorak for a new loop unroller and various fixes. + +@item +Michael Eager for his work on the Xilinx MicroBlaze port. + +@item +Richard Earnshaw for his ongoing work with the ARM@. + +@item +David Edelsohn for his direction via the steering committee, ongoing work +with the RS6000/PowerPC port, help cleaning up Haifa loop changes, +doing the entire AIX port of libstdc++ with his bare hands, and for +ensuring GCC properly keeps working on AIX@. + +@item +Kevin Ediger for the floating point formatting of num_put::do_put in +libstdc++. + +@item +Phil Edwards for libstdc++ work including configuration hackery, +documentation maintainer, chief breaker of the web pages, the occasional +iostream bug fix, and work on shared library symbol versioning. + +@item +Paul Eggert for random hacking all over GCC@. + +@item +Mark Elbrecht for various DJGPP improvements, and for libstdc++ +configuration support for locales and fstream-related fixes. + +@item +Vadim Egorov for libstdc++ fixes in strings, streambufs, and iostreams. + +@item +Christian Ehrhardt for dealing with bug reports. + +@item +Ben Elliston for his work to move the Objective-C runtime into its +own subdirectory and for his work on autoconf. + +@item +Revital Eres for work on the PowerPC 750CL port. + +@item +Marc Espie for OpenBSD support. + +@item +Doug Evans for much of the global optimization framework, arc, m32r, +and SPARC work. + +@item +Christopher Faylor for his work on the Cygwin port and for caring and +feeding the gcc.gnu.org box and saving its users tons of spam. + +@item +Fred Fish for BeOS support and Ada fixes. + +@item +Ivan Fontes Garcia for the Portuguese translation of the GCJ FAQ@. + +@item +Peter Gerwinski for various bug fixes and the Pascal front end. + +@item +Kaveh R.@: Ghazi for his direction via the steering committee, amazing +work to make @samp{-W -Wall -W* -Werror} useful, and +testing GCC on a plethora of platforms. Kaveh extends his gratitude to +the CAIP Center at Rutgers University for providing him with computing +resources to work on Free Software from the late 1980s to 2010. + +@item +John Gilmore for a donation to the FSF earmarked improving GNU Java. + +@item +Judy Goldberg for c++ contributions. + +@item +Torbjorn Granlund for various fixes and the c-torture testsuite, +multiply- and divide-by-constant optimization, improved long long +support, improved leaf function register allocation, and his direction +via the steering committee. + +@item +Jonny Grant for improvements to @code{collect2's} @option{--help} documentation. + +@item +Anthony Green for his @option{-Os} contributions, the moxie port, and +Java front end work. + +@item +Stu Grossman for gdb hacking, allowing GCJ developers to debug Java code. + +@item +Michael K. Gschwind contributed the port to the PDP-11. + +@item +Richard Biener for his ongoing middle-end contributions and bug fixes +and for release management. + +@item +Ron Guilmette implemented the @command{protoize} and @command{unprotoize} +tools, the support for DWARF 1 symbolic debugging information, and much of +the support for System V Release 4. He has also worked heavily on the +Intel 386 and 860 support. + +@item +Sumanth Gundapaneni for contributing the CR16 port. + +@item +Mostafa Hagog for Swing Modulo Scheduling (SMS) and post reload GCSE@. + +@item +Bruno Haible for improvements in the runtime overhead for EH, new +warnings and assorted bug fixes. + +@item +Andrew Haley for his amazing Java compiler and library efforts. + +@item +Chris Hanson assisted in making GCC work on HP-UX for the 9000 series 300. + +@item +Michael Hayes for various thankless work he's done trying to get +the c30/c40 ports functional. Lots of loop and unroll improvements and +fixes. + +@item +Dara Hazeghi for wading through myriads of target-specific bug reports. + +@item +Kate Hedstrom for staking the G77 folks with an initial testsuite. + +@item +Richard Henderson for his ongoing SPARC, alpha, ia32, and ia64 work, loop +opts, and generally fixing lots of old problems we've ignored for +years, flow rewrite and lots of further stuff, including reviewing +tons of patches. + +@item +Aldy Hernandez for working on the PowerPC port, SIMD support, and +various fixes. + +@item +Nobuyuki Hikichi of Software Research Associates, Tokyo, contributed +the support for the Sony NEWS machine. + +@item +Kazu Hirata for caring and feeding the Renesas H8/300 port and various fixes. + +@item +Katherine Holcomb for work on GNU Fortran. + +@item +Manfred Hollstein for his ongoing work to keep the m88k alive, lots +of testing and bug fixing, particularly of GCC configury code. + +@item +Steve Holmgren for MachTen patches. + +@item +Mat Hostetter for work on the TILE-Gx and TILEPro ports. + +@item +Jan Hubicka for his x86 port improvements. + +@item +Falk Hueffner for working on C and optimization bug reports. + +@item +Bernardo Innocenti for his m68k work, including merging of +ColdFire improvements and uClinux support. + +@item +Christian Iseli for various bug fixes. + +@item +Kamil Iskra for general m68k hacking. + +@item +Lee Iverson for random fixes and MIPS testing. + +@item +Balaji V. Iyer for Cilk+ development and merging. + +@item +Andreas Jaeger for testing and benchmarking of GCC and various bug fixes. + +@item +Martin Jambor for his work on inter-procedural optimizations, the +switch conversion pass, and scalar replacement of aggregates. + +@item +Jakub Jelinek for his SPARC work and sibling call optimizations as well +as lots of bug fixes and test cases, and for improving the Java build +system. + +@item +Janis Johnson for ia64 testing and fixes, her quality improvement +sidetracks, and web page maintenance. + +@item +Kean Johnston for SCO OpenServer support and various fixes. + +@item +Tim Josling for the sample language treelang based originally on Richard +Kenner's ``toy'' language. + +@item +Nicolai Josuttis for additional libstdc++ documentation. + +@item +Klaus Kaempf for his ongoing work to make alpha-vms a viable target. + +@item +Steven G. Kargl for work on GNU Fortran. + +@item +David Kashtan of SRI adapted GCC to VMS@. + +@item +Ryszard Kabatek for many, many libstdc++ bug fixes and optimizations of +strings, especially member functions, and for auto_ptr fixes. + +@item +Geoffrey Keating for his ongoing work to make the PPC work for GNU/Linux +and his automatic regression tester. + +@item +Brendan Kehoe for his ongoing work with G++ and for a lot of early work +in just about every part of libstdc++. + +@item +Oliver M. Kellogg of Deutsche Aerospace contributed the port to the +MIL-STD-1750A@. + +@item +Richard Kenner of the New York University Ultracomputer Research +Laboratory wrote the machine descriptions for the AMD 29000, the DEC +Alpha, the IBM RT PC, and the IBM RS/6000 as well as the support for +instruction attributes. He also made changes to better support RISC +processors including changes to common subexpression elimination, +strength reduction, function calling sequence handling, and condition +code support, in addition to generalizing the code for frame pointer +elimination and delay slot scheduling. Richard Kenner was also the +head maintainer of GCC for several years. + +@item +Mumit Khan for various contributions to the Cygwin and Mingw32 ports and +maintaining binary releases for Microsoft Windows hosts, and for massive libstdc++ +porting work to Cygwin/Mingw32. + +@item +Robin Kirkham for cpu32 support. + +@item +Mark Klein for PA improvements. + +@item +Thomas Koenig for various bug fixes. + +@item +Bruce Korb for the new and improved fixincludes code. + +@item +Benjamin Kosnik for his G++ work and for leading the libstdc++-v3 effort. + +@item +Maxim Kuvyrkov for contributions to the instruction scheduler, the Android +and m68k/Coldfire ports, and optimizations. + +@item +Charles LaBrec contributed the support for the Integrated Solutions +68020 system. + +@item +Asher Langton and Mike Kumbera for contributing Cray pointer support +to GNU Fortran, and for other GNU Fortran improvements. + +@item +Jeff Law for his direction via the steering committee, coordinating the +entire egcs project and GCC 2.95, rolling out snapshots and releases, +handling merges from GCC2, reviewing tons of patches that might have +fallen through the cracks else, and random but extensive hacking. + +@item +Walter Lee for work on the TILE-Gx and TILEPro ports. + +@item +Marc Lehmann for his direction via the steering committee and helping +with analysis and improvements of x86 performance. + +@item +Victor Leikehman for work on GNU Fortran. + +@item +Ted Lemon wrote parts of the RTL reader and printer. + +@item +Kriang Lerdsuwanakij for C++ improvements including template as template +parameter support, and many C++ fixes. + +@item +Warren Levy for tremendous work on libgcj (Java Runtime Library) and +random work on the Java front end. + +@item +Alain Lichnewsky ported GCC to the MIPS CPU@. + +@item +Oskar Liljeblad for hacking on AWT and his many Java bug reports and +patches. + +@item +Robert Lipe for OpenServer support, new testsuites, testing, etc. + +@item +Chen Liqin for various S+core related fixes/improvement, and for +maintaining the S+core port. + +@item +Martin Liska for his work on identical code folding, the sanitizers, +HSA, general bug fixing and for running automated regression testing of GCC +and reporting numerous bugs. + +@item +Weiwen Liu for testing and various bug fixes. + +@item +Manuel L@'opez-Ib@'a@~nez for improving @option{-Wconversion} and +many other diagnostics fixes and improvements. + +@item +Dave Love for his ongoing work with the Fortran front end and +runtime libraries. + +@item +Martin von L@"owis for internal consistency checking infrastructure, +various C++ improvements including namespace support, and tons of +assistance with libstdc++/compiler merges. + +@item +H.J. Lu for his previous contributions to the steering committee, many x86 +bug reports, prototype patches, and keeping the GNU/Linux ports working. + +@item +Greg McGary for random fixes and (someday) bounded pointers. + +@item +Andrew MacLeod for his ongoing work in building a real EH system, +various code generation improvements, work on the global optimizer, etc. + +@item +Vladimir Makarov for hacking some ugly i960 problems, PowerPC hacking +improvements to compile-time performance, overall knowledge and +direction in the area of instruction scheduling, design and +implementation of the automaton based instruction scheduler and +design and implementation of the integrated and local register allocators. + +@item +David Malcolm for his work on improving GCC diagnostics, JIT, self-tests +and unit testing. + +@item +Bob Manson for his behind the scenes work on dejagnu. + +@item +John Marino for contributing the DragonFly BSD port. + +@item +Philip Martin for lots of libstdc++ string and vector iterator fixes and +improvements, and string clean up and testsuites. + +@item +Michael Matz for his work on dominance tree discovery, the x86-64 port, +link-time optimization framework and general optimization improvements. + +@item +All of the Mauve project contributors for Java test code. + +@item +Bryce McKinlay for numerous GCJ and libgcj fixes and improvements. + +@item +Adam Megacz for his work on the Microsoft Windows port of GCJ@. + +@item +Michael Meissner for LRS framework, ia32, m32r, v850, m88k, MIPS, +powerpc, haifa, ECOFF debug support, and other assorted hacking. + +@item +Jason Merrill for his direction via the steering committee and leading +the G++ effort. + +@item +Martin Michlmayr for testing GCC on several architectures using the +entire Debian archive. + +@item +David Miller for his direction via the steering committee, lots of +SPARC work, improvements in jump.cc and interfacing with the Linux kernel +developers. + +@item +Gary Miller ported GCC to Charles River Data Systems machines. + +@item +Alfred Minarik for libstdc++ string and ios bug fixes, and turning the +entire libstdc++ testsuite namespace-compatible. + +@item +Mark Mitchell for his direction via the steering committee, mountains of +C++ work, load/store hoisting out of loops, alias analysis improvements, +ISO C @code{restrict} support, and serving as release manager from 2000 +to 2011. + +@item +Alan Modra for various GNU/Linux bits and testing. + +@item +Toon Moene for his direction via the steering committee, Fortran +maintenance, and his ongoing work to make us make Fortran run fast. + +@item +Jason Molenda for major help in the care and feeding of all the services +on the gcc.gnu.org (formerly egcs.cygnus.com) machine---mail, web +services, ftp services, etc etc. Doing all this work on scrap paper and +the backs of envelopes would have been@dots{} difficult. + +@item +Catherine Moore for fixing various ugly problems we have sent her +way, including the haifa bug which was killing the Alpha & PowerPC +Linux kernels. + +@item +Mike Moreton for his various Java patches. + +@item +David Mosberger-Tang for various Alpha improvements, and for the initial +IA-64 port. + +@item +Stephen Moshier contributed the floating point emulator that assists in +cross-compilation and permits support for floating point numbers wider +than 64 bits and for ISO C99 support. + +@item +Bill Moyer for his behind the scenes work on various issues. + +@item +Philippe De Muyter for his work on the m68k port. + +@item +Joseph S. Myers for his work on the PDP-11 port, format checking and ISO +C99 support, and continuous emphasis on (and contributions to) documentation. + +@item +Nathan Myers for his work on libstdc++-v3: architecture and authorship +through the first three snapshots, including implementation of locale +infrastructure, string, shadow C headers, and the initial project +documentation (DESIGN, CHECKLIST, and so forth). Later, more work on +MT-safe string and shadow headers. + +@item +Felix Natter for documentation on porting libstdc++. + +@item +Nathanael Nerode for cleaning up the configuration/build process. + +@item +NeXT, Inc.@: donated the front end that supports the Objective-C +language. + +@item +Hans-Peter Nilsson for the CRIS and MMIX ports, improvements to the search +engine setup, various documentation fixes and other small fixes. + +@item +Geoff Noer for his work on getting cygwin native builds working. + +@item +Vegard Nossum for running automated regression testing of GCC and reporting +numerous bugs. + +@item +Diego Novillo for his work on Tree SSA, OpenMP, SPEC performance +tracking web pages, GIMPLE tuples, and assorted fixes. + +@item +David O'Brien for the FreeBSD/alpha, FreeBSD/AMD x86-64, FreeBSD/ARM, +FreeBSD/PowerPC, and FreeBSD/SPARC64 ports and related infrastructure +improvements. + +@item +Alexandre Oliva for various build infrastructure improvements, scripts and +amazing testing work, including keeping libtool issues sane and happy. + +@item +Stefan Olsson for work on mt_alloc. + +@item +Melissa O'Neill for various NeXT fixes. + +@item +Rainer Orth for random MIPS work, including improvements to GCC's o32 +ABI support, improvements to dejagnu's MIPS support, Java configuration +clean-ups and porting work, and maintaining the IRIX, Solaris 2, and +Tru64 UNIX ports. + +@item +Steven Pemberton for his contribution of @file{enquire} which allowed GCC to +determine various properties of the floating point unit and generate +@file{float.h} in older versions of GCC. + +@item +Hartmut Penner for work on the s390 port. + +@item +Paul Petersen wrote the machine description for the Alliant FX/8. + +@item +Alexandre Petit-Bianco for implementing much of the Java compiler and +continued Java maintainership. + +@item +Matthias Pfaller for major improvements to the NS32k port. + +@item +Gerald Pfeifer for his direction via the steering committee, pointing +out lots of problems we need to solve, maintenance of the web pages, and +taking care of documentation maintenance in general. + +@item +Marek Polacek for his work on the C front end, the sanitizers and general +bug fixing. + +@item +Andrew Pinski for processing bug reports by the dozen. + +@item +Ovidiu Predescu for his work on the Objective-C front end and runtime +libraries. + +@item +Jerry Quinn for major performance improvements in C++ formatted I/O@. + +@item +Ken Raeburn for various improvements to checker, MIPS ports and various +cleanups in the compiler. + +@item +Rolf W. Rasmussen for hacking on AWT@. + +@item +David Reese of Sun Microsystems contributed to the Solaris on PowerPC +port. + +@item +John Regehr for running automated regression testing of GCC and reporting +numerous bugs. + +@item +Volker Reichelt for running automated regression testing of GCC and reporting +numerous bugs and for keeping up with the problem reports. + +@item +Joern Rennecke for maintaining the sh port, loop, regmove & reload +hacking and developing and maintaining the Epiphany port. + +@item +Loren J. Rittle for improvements to libstdc++-v3 including the FreeBSD +port, threading fixes, thread-related configury changes, critical +threading documentation, and solutions to really tricky I/O problems, +as well as keeping GCC properly working on FreeBSD and continuous testing. + +@item +Craig Rodrigues for processing tons of bug reports. + +@item +Ola R@"onnerup for work on mt_alloc. + +@item +Gavin Romig-Koch for lots of behind the scenes MIPS work. + +@item +David Ronis inspired and encouraged Craig to rewrite the G77 +documentation in texinfo format by contributing a first pass at a +translation of the old @file{g77-0.5.16/f/DOC} file. + +@item +Ken Rose for fixes to GCC's delay slot filling code. + +@item +Ira Rosen for her contributions to the auto-vectorizer. + +@item +Paul Rubin wrote most of the preprocessor. + +@item +P@'etur Run@'olfsson for major performance improvements in C++ formatted I/O and +large file support in C++ filebuf. + +@item +Chip Salzenberg for libstdc++ patches and improvements to locales, traits, +Makefiles, libio, libtool hackery, and ``long long'' support. + +@item +Juha Sarlin for improvements to the H8 code generator. + +@item +Greg Satz assisted in making GCC work on HP-UX for the 9000 series 300. + +@item +Roger Sayle for improvements to constant folding and GCC's RTL optimizers +as well as for fixing numerous bugs. + +@item +Bradley Schatz for his work on the GCJ FAQ@. + +@item +Peter Schauer wrote the code to allow debugging to work on the Alpha. + +@item +William Schelter did most of the work on the Intel 80386 support. + +@item +Tobias Schl@"uter for work on GNU Fortran. + +@item +Bernd Schmidt for various code generation improvements and major +work in the reload pass, serving as release manager for +GCC 2.95.3, and work on the Blackfin and C6X ports. + +@item +Peter Schmid for constant testing of libstdc++---especially application +testing, going above and beyond what was requested for the release +criteria---and libstdc++ header file tweaks. + +@item +Jason Schroeder for jcf-dump patches. + +@item +Andreas Schwab for his work on the m68k port. + +@item +Lars Segerlund for work on GNU Fortran. + +@item +Dodji Seketeli for numerous C++ bug fixes and debug info improvements. + +@item +Tim Shen for major work on @code{}. + +@item +Joel Sherrill for his direction via the steering committee, RTEMS +contributions and RTEMS testing. + +@item +Nathan Sidwell for many C++ fixes/improvements. + +@item +Jeffrey Siegal for helping RMS with the original design of GCC, some +code which handles the parse tree and RTL data structures, constant +folding and help with the original VAX & m68k ports. + +@item +Kenny Simpson for prompting libstdc++ fixes due to defect reports from +the LWG (thereby keeping GCC in line with updates from the ISO)@. + +@item +Franz Sirl for his ongoing work with making the PPC port stable +for GNU/Linux. + +@item +Andrey Slepuhin for assorted AIX hacking. + +@item +Trevor Smigiel for contributing the SPU port. + +@item +Christopher Smith did the port for Convex machines. + +@item +Danny Smith for his major efforts on the Mingw (and Cygwin) ports. +Retired from GCC maintainership August 2010, having mentored two +new maintainers into the role. + +@item +Randy Smith finished the Sun FPA support. + +@item +Ed Smith-Rowland for his continuous work on libstdc++-v3, special functions, +@code{}, and various improvements to C++11 features. + +@item +Scott Snyder for queue, iterator, istream, and string fixes and libstdc++ +testsuite entries. Also for providing the patch to G77 to add +rudimentary support for @code{INTEGER*1}, @code{INTEGER*2}, and +@code{LOGICAL*1}. + +@item +Zdenek Sojka for running automated regression testing of GCC and reporting +numerous bugs. + +@item +Arseny Solokha for running automated regression testing of GCC and reporting +numerous bugs. + +@item +Jayant Sonar for contributing the CR16 port. + +@item +Brad Spencer for contributions to the GLIBCPP_FORCE_NEW technique. + +@item +Richard Stallman, for writing the original GCC and launching the GNU project. + +@item +Jan Stein of the Chalmers Computer Society provided support for +Genix, as well as part of the 32000 machine description. + +@item +Gerhard Steinmetz for running automated regression testing of GCC and reporting +numerous bugs. + +@item +Nigel Stephens for various mips16 related fixes/improvements. + +@item +Jonathan Stone wrote the machine description for the Pyramid computer. + +@item +Graham Stott for various infrastructure improvements. + +@item +John Stracke for his Java HTTP protocol fixes. + +@item +Mike Stump for his Elxsi port, G++ contributions over the years and more +recently his vxworks contributions + +@item +Jeff Sturm for Java porting help, bug fixes, and encouragement. + +@item +Zhendong Su for running automated regression testing of GCC and reporting +numerous bugs. + +@item +Chengnian Sun for running automated regression testing of GCC and reporting +numerous bugs. + +@item +Shigeya Suzuki for this fixes for the bsdi platforms. + +@item +Ian Lance Taylor for the Go frontend, the initial mips16 and mips64 +support, general configury hacking, fixincludes, etc. + +@item +Holger Teutsch provided the support for the Clipper CPU@. + +@item +Gary Thomas for his ongoing work to make the PPC work for GNU/Linux. + +@item +Paul Thomas for contributions to GNU Fortran. + +@item +Philipp Thomas for random bug fixes throughout the compiler + +@item +Jason Thorpe for thread support in libstdc++ on NetBSD@. + +@item +Kresten Krab Thorup wrote the run time support for the Objective-C +language and the fantastic Java bytecode interpreter. + +@item +Michael Tiemann for random bug fixes, the first instruction scheduler, +initial C++ support, function integration, NS32k, SPARC and M88k +machine description work, delay slot scheduling. + +@item +Andreas Tobler for his work porting libgcj to Darwin. + +@item +Teemu Torma for thread safe exception handling support. + +@item +Leonard Tower wrote parts of the parser, RTL generator, and RTL +definitions, and of the VAX machine description. + +@item +Daniel Towner and Hariharan Sandanagobalane contributed and +maintain the picoChip port. + +@item +Tom Tromey for internationalization support and for his many Java +contributions and libgcj maintainership. + +@item +Lassi Tuura for improvements to config.guess to determine HP processor +types. + +@item +Petter Urkedal for libstdc++ CXXFLAGS, math, and algorithms fixes. + +@item +Andy Vaught for the design and initial implementation of the GNU Fortran +front end. + +@item +Brent Verner for work with the libstdc++ cshadow files and their +associated configure steps. + +@item +Todd Vierling for contributions for NetBSD ports. + +@item +Andrew Waterman for contributing the RISC-V port, as well as maintaining it. + +@item +Jonathan Wakely for contributing libstdc++ Doxygen notes and XHTML +guidance and maintaining libstdc++. + +@item +Dean Wakerley for converting the install documentation from HTML to texinfo +in time for GCC 3.0. + +@item +Krister Walfridsson for random bug fixes. + +@item +Feng Wang for contributions to GNU Fortran. + +@item +Stephen M. Webb for time and effort on making libstdc++ shadow files +work with the tricky Solaris 8+ headers, and for pushing the build-time +header tree. Also, for starting and driving the @code{} effort. + +@item +John Wehle for various improvements for the x86 code generator, +related infrastructure improvements to help x86 code generation, +value range propagation and other work, WE32k port. + +@item +Ulrich Weigand for work on the s390 port. + +@item +Janus Weil for contributions to GNU Fortran. + +@item +Zack Weinberg for major work on cpplib and various other bug fixes. + +@item +Matt Welsh for help with Linux Threads support in GCJ@. + +@item +Urban Widmark for help fixing java.io. + +@item +Mark Wielaard for new Java library code and his work integrating with +Classpath. + +@item +Dale Wiles helped port GCC to the Tahoe. + +@item +Bob Wilson from Tensilica, Inc.@: for the Xtensa port. + +@item +Jim Wilson for his direction via the steering committee, tackling hard +problems in various places that nobody else wanted to work on, strength +reduction and other loop optimizations. + +@item +Paul Woegerer and Tal Agmon for the CRX port. + +@item +Carlo Wood for various fixes. + +@item +Tom Wood for work on the m88k port. + +@item +Chung-Ju Wu for his work on the Andes NDS32 port. + +@item +Canqun Yang for work on GNU Fortran. + +@item +Masanobu Yuhara of Fujitsu Laboratories implemented the machine +description for the Tron architecture (specifically, the Gmicro). + +@item +Kevin Zachmann helped port GCC to the Tahoe. + +@item +Ayal Zaks for Swing Modulo Scheduling (SMS). + +@item +Qirun Zhang for running automated regression testing of GCC and reporting +numerous bugs. + +@item +Xiaoqiang Zhang for work on GNU Fortran. + +@item +Gilles Zunino for help porting Java to Irix. + +@end itemize + +The following people are recognized for their contributions to GNAT, +the Ada front end of GCC: +@itemize @bullet +@item +Bernard Banner + +@item +Romain Berrendonner + +@item +Geert Bosch + +@item +Emmanuel Briot + +@item +Joel Brobecker + +@item +Ben Brosgol + +@item +Vincent Celier + +@item +Arnaud Charlet + +@item +Chien Chieng + +@item +Cyrille Comar + +@item +Cyrille Crozes + +@item +Robert Dewar + +@item +Gary Dismukes + +@item +Robert Duff + +@item +Ed Falis + +@item +Ramon Fernandez + +@item +Sam Figueroa + +@item +Vasiliy Fofanov + +@item +Michael Friess + +@item +Franco Gasperoni + +@item +Ted Giering + +@item +Matthew Gingell + +@item +Laurent Guerby + +@item +Jerome Guitton + +@item +Olivier Hainque + +@item +Jerome Hugues + +@item +Hristian Kirtchev + +@item +Jerome Lambourg + +@item +Bruno Leclerc + +@item +Albert Lee + +@item +Sean McNeil + +@item +Javier Miranda + +@item +Laurent Nana + +@item +Pascal Obry + +@item +Dong-Ik Oh + +@item +Laurent Pautet + +@item +Brett Porter + +@item +Thomas Quinot + +@item +Nicolas Roche + +@item +Pat Rogers + +@item +Jose Ruiz + +@item +Douglas Rupp + +@item +Sergey Rybin + +@item +Gail Schenker + +@item +Ed Schonberg + +@item +Nicolas Setton + +@item +Samuel Tardieu + +@end itemize + + +The following people are recognized for their contributions of new +features, bug reports, testing and integration of classpath/libgcj for +GCC version 4.1: +@itemize @bullet +@item +Lillian Angel for @code{JTree} implementation and lots Free Swing +additions and bug fixes. + +@item +Wolfgang Baer for @code{GapContent} bug fixes. + +@item +Anthony Balkissoon for @code{JList}, Free Swing 1.5 updates and mouse event +fixes, lots of Free Swing work including @code{JTable} editing. + +@item +Stuart Ballard for RMI constant fixes. + +@item +Goffredo Baroncelli for @code{HTTPURLConnection} fixes. + +@item +Gary Benson for @code{MessageFormat} fixes. + +@item +Daniel Bonniot for @code{Serialization} fixes. + +@item +Chris Burdess for lots of gnu.xml and http protocol fixes, @code{StAX} +and @code{DOM xml:id} support. + +@item +Ka-Hing Cheung for @code{TreePath} and @code{TreeSelection} fixes. + +@item +Archie Cobbs for build fixes, VM interface updates, +@code{URLClassLoader} updates. + +@item +Kelley Cook for build fixes. + +@item +Martin Cordova for Suggestions for better @code{SocketTimeoutException}. + +@item +David Daney for @code{BitSet} bug fixes, @code{HttpURLConnection} +rewrite and improvements. + +@item +Thomas Fitzsimmons for lots of upgrades to the gtk+ AWT and Cairo 2D +support. Lots of imageio framework additions, lots of AWT and Free +Swing bug fixes. + +@item +Jeroen Frijters for @code{ClassLoader} and nio cleanups, serialization fixes, +better @code{Proxy} support, bug fixes and IKVM integration. + +@item +Santiago Gala for @code{AccessControlContext} fixes. + +@item +Nicolas Geoffray for @code{VMClassLoader} and @code{AccessController} +improvements. + +@item +David Gilbert for @code{basic} and @code{metal} icon and plaf support +and lots of documenting, Lots of Free Swing and metal theme +additions. @code{MetalIconFactory} implementation. + +@item +Anthony Green for @code{MIDI} framework, @code{ALSA} and @code{DSSI} +providers. + +@item +Andrew Haley for @code{Serialization} and @code{URLClassLoader} fixes, +gcj build speedups. + +@item +Kim Ho for @code{JFileChooser} implementation. + +@item +Andrew John Hughes for @code{Locale} and net fixes, URI RFC2986 +updates, @code{Serialization} fixes, @code{Properties} XML support and +generic branch work, VMIntegration guide update. + +@item +Bastiaan Huisman for @code{TimeZone} bug fixing. + +@item +Andreas Jaeger for mprec updates. + +@item +Paul Jenner for better @option{-Werror} support. + +@item +Ito Kazumitsu for @code{NetworkInterface} implementation and updates. + +@item +Roman Kennke for @code{BoxLayout}, @code{GrayFilter} and +@code{SplitPane}, plus bug fixes all over. Lots of Free Swing work +including styled text. + +@item +Simon Kitching for @code{String} cleanups and optimization suggestions. + +@item +Michael Koch for configuration fixes, @code{Locale} updates, bug and +build fixes. + +@item +Guilhem Lavaux for configuration, thread and channel fixes and Kaffe +integration. JCL native @code{Pointer} updates. Logger bug fixes. + +@item +David Lichteblau for JCL support library global/local reference +cleanups. + +@item +Aaron Luchko for JDWP updates and documentation fixes. + +@item +Ziga Mahkovec for @code{Graphics2D} upgraded to Cairo 0.5 and new regex +features. + +@item +Sven de Marothy for BMP imageio support, CSS and @code{TextLayout} +fixes. @code{GtkImage} rewrite, 2D, awt, free swing and date/time fixes and +implementing the Qt4 peers. + +@item +Casey Marshall for crypto algorithm fixes, @code{FileChannel} lock, +@code{SystemLogger} and @code{FileHandler} rotate implementations, NIO +@code{FileChannel.map} support, security and policy updates. + +@item +Bryce McKinlay for RMI work. + +@item +Audrius Meskauskas for lots of Free Corba, RMI and HTML work plus +testing and documenting. + +@item +Kalle Olavi Niemitalo for build fixes. + +@item +Rainer Orth for build fixes. + +@item +Andrew Overholt for @code{File} locking fixes. + +@item +Ingo Proetel for @code{Image}, @code{Logger} and @code{URLClassLoader} +updates. + +@item +Olga Rodimina for @code{MenuSelectionManager} implementation. + +@item +Jan Roehrich for @code{BasicTreeUI} and @code{JTree} fixes. + +@item +Julian Scheid for documentation updates and gjdoc support. + +@item +Christian Schlichtherle for zip fixes and cleanups. + +@item +Robert Schuster for documentation updates and beans fixes, +@code{TreeNode} enumerations and @code{ActionCommand} and various +fixes, XML and URL, AWT and Free Swing bug fixes. + +@item +Keith Seitz for lots of JDWP work. + +@item +Christian Thalinger for 64-bit cleanups, Configuration and VM +interface fixes and @code{CACAO} integration, @code{fdlibm} updates. + +@item +Gael Thomas for @code{VMClassLoader} boot packages support suggestions. + +@item +Andreas Tobler for Darwin and Solaris testing and fixing, @code{Qt4} +support for Darwin/OS X, @code{Graphics2D} support, @code{gtk+} +updates. + +@item +Dalibor Topic for better @code{DEBUG} support, build cleanups and +Kaffe integration. @code{Qt4} build infrastructure, @code{SHA1PRNG} +and @code{GdkPixbugDecoder} updates. + +@item +Tom Tromey for Eclipse integration, generics work, lots of bug fixes +and gcj integration including coordinating The Big Merge. + +@item +Mark Wielaard for bug fixes, packaging and release management, +@code{Clipboard} implementation, system call interrupts and network +timeouts and @code{GdkPixpufDecoder} fixes. + +@end itemize + + +In addition to the above, all of which also contributed time and energy in +testing GCC, we would like to thank the following for their contributions +to testing: + +@itemize @bullet +@item +Michael Abd-El-Malek + +@item +Thomas Arend + +@item +Bonzo Armstrong + +@item +Steven Ashe + +@item +Chris Baldwin + +@item +David Billinghurst + +@item +Jim Blandy + +@item +Stephane Bortzmeyer + +@item +Horst von Brand + +@item +Frank Braun + +@item +Rodney Brown + +@item +Sidney Cadot + +@item +Bradford Castalia + +@item +Robert Clark + +@item +Jonathan Corbet + +@item +Ralph Doncaster + +@item +Richard Emberson + +@item +Levente Farkas + +@item +Graham Fawcett + +@item +Mark Fernyhough + +@item +Robert A. French + +@item +J@"orgen Freyh + +@item +Mark K. Gardner + +@item +Charles-Antoine Gauthier + +@item +Yung Shing Gene + +@item +David Gilbert + +@item +Simon Gornall + +@item +Fred Gray + +@item +John Griffin + +@item +Patrik Hagglund + +@item +Phil Hargett + +@item +Amancio Hasty + +@item +Takafumi Hayashi + +@item +Bryan W. Headley + +@item +Kevin B. Hendricks + +@item +Joep Jansen + +@item +Christian Joensson + +@item +Michel Kern + +@item +David Kidd + +@item +Tobias Kuipers + +@item +Anand Krishnaswamy + +@item +A. O. V. Le Blanc + +@item +llewelly + +@item +Damon Love + +@item +Brad Lucier + +@item +Matthias Klose + +@item +Martin Knoblauch + +@item +Rick Lutowski + +@item +Jesse Macnish + +@item +Stefan Morrell + +@item +Anon A. Mous + +@item +Matthias Mueller + +@item +Pekka Nikander + +@item +Rick Niles + +@item +Jon Olson + +@item +Magnus Persson + +@item +Chris Pollard + +@item +Richard Polton + +@item +Derk Reefman + +@item +David Rees + +@item +Paul Reilly + +@item +Tom Reilly + +@item +Torsten Rueger + +@item +Danny Sadinoff + +@item +Marc Schifer + +@item +Erik Schnetter + +@item +Wayne K. Schroll + +@item +David Schuler + +@item +Vin Shelton + +@item +Tim Souder + +@item +Adam Sulmicki + +@item +Bill Thorson + +@item +George Talbot + +@item +Pedro A. M. Vazquez + +@item +Gregory Warnes + +@item +Ian Watson + +@item +David E. Young + +@item +And many others +@end itemize + +And finally we'd like to thank everyone who uses the compiler, provides +feedback and generally reminds us why we're doing this work in the first +place. diff --git a/gcc/doc/contribute.texi b/gcc/doc/contribute.texi new file mode 100644 index 00000000000..74d8670348b --- /dev/null +++ b/gcc/doc/contribute.texi @@ -0,0 +1,24 @@ +@c Copyright (C) 1988-2022 Free Software Foundation, Inc. +@c This is part of the GCC manual. +@c For copying conditions, see the file gcc.texi. + +@node Contributing +@chapter Contributing to GCC Development + +If you would like to help pretest GCC releases to assure they work well, +current development sources are available via Git (see +@uref{https://gcc.gnu.org/git.html}). Source and binary snapshots are +also available for FTP; see @uref{https://gcc.gnu.org/snapshots.html}. + +If you would like to work on improvements to GCC, please read the +advice at these URLs: + +@smallexample +@uref{https://gcc.gnu.org/contribute.html} +@uref{https://gcc.gnu.org/contributewhy.html} +@end smallexample + +@noindent +for information on how to make useful contributions and avoid +duplication of effort. Suggested projects are listed at +@uref{https://gcc.gnu.org/projects/}. diff --git a/gcc/doc/cpp.texi b/gcc/doc/cpp.texi new file mode 100644 index 00000000000..90b2767e39a --- /dev/null +++ b/gcc/doc/cpp.texi @@ -0,0 +1,4600 @@ +\input texinfo +@setfilename cpp.info +@settitle The C Preprocessor +@setchapternewpage off +@c @smallbook +@c @cropmarks +@c @finalout + +@include gcc-common.texi + +@copying +@c man begin COPYRIGHT +Copyright @copyright{} 1987-2022 Free Software Foundation, Inc. + +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation. A copy of +the license is included in the +@c man end +section entitled ``GNU Free Documentation License''. +@ignore +@c man begin COPYRIGHT +man page gfdl(7). +@c man end +@end ignore + +@c man begin COPYRIGHT +This manual contains no Invariant Sections. The Front-Cover Texts are +(a) (see below), and the Back-Cover Texts are (b) (see below). + +(a) The FSF's Front-Cover Text is: + + A GNU Manual + +(b) The FSF's Back-Cover Text is: + + You have freedom to copy and modify this GNU Manual, like GNU + software. Copies published by the Free Software Foundation raise + funds for GNU development. +@c man end +@end copying + +@c Create a separate index for command line options. +@defcodeindex op +@syncodeindex vr op + +@c Used in cppopts.texi and cppenv.texi. +@set cppmanual + +@ifinfo +@dircategory Software development +@direntry +* Cpp: (cpp). The GNU C preprocessor. +@end direntry +@end ifinfo + +@titlepage +@title The C Preprocessor +@versionsubtitle +@author Richard M. Stallman, Zachary Weinberg +@page +@c There is a fill at the bottom of the page, so we need a filll to +@c override it. +@vskip 0pt plus 1filll +@insertcopying +@end titlepage +@contents +@page + +@ifnottex +@node Top +@top +The C preprocessor implements the macro language used to transform C, +C++, and Objective-C programs before they are compiled. It can also be +useful on its own. + +@menu +* Overview:: +* Header Files:: +* Macros:: +* Conditionals:: +* Diagnostics:: +* Line Control:: +* Pragmas:: +* Other Directives:: +* Preprocessor Output:: +* Traditional Mode:: +* Implementation Details:: +* Invocation:: +* Environment Variables:: +* GNU Free Documentation License:: +* Index of Directives:: +* Option Index:: +* Concept Index:: + +@detailmenu + --- The Detailed Node Listing --- + +Overview + +* Character sets:: +* Initial processing:: +* Tokenization:: +* The preprocessing language:: + +Header Files + +* Include Syntax:: +* Include Operation:: +* Search Path:: +* Once-Only Headers:: +* Alternatives to Wrapper #ifndef:: +* Computed Includes:: +* Wrapper Headers:: +* System Headers:: + +Macros + +* Object-like Macros:: +* Function-like Macros:: +* Macro Arguments:: +* Stringizing:: +* Concatenation:: +* Variadic Macros:: +* Predefined Macros:: +* Undefining and Redefining Macros:: +* Directives Within Macro Arguments:: +* Macro Pitfalls:: + +Predefined Macros + +* Standard Predefined Macros:: +* Common Predefined Macros:: +* System-specific Predefined Macros:: +* C++ Named Operators:: + +Macro Pitfalls + +* Misnesting:: +* Operator Precedence Problems:: +* Swallowing the Semicolon:: +* Duplication of Side Effects:: +* Self-Referential Macros:: +* Argument Prescan:: +* Newlines in Arguments:: + +Conditionals + +* Conditional Uses:: +* Conditional Syntax:: +* Deleted Code:: + +Conditional Syntax + +* Ifdef:: +* If:: +* Defined:: +* Else:: +* Elif:: + +Implementation Details + +* Implementation-defined behavior:: +* Implementation limits:: +* Obsolete Features:: + +Obsolete Features + +* Obsolete Features:: + +@end detailmenu +@end menu + +@insertcopying +@end ifnottex + +@node Overview +@chapter Overview +@c man begin DESCRIPTION +The C preprocessor, often known as @dfn{cpp}, is a @dfn{macro processor} +that is used automatically by the C compiler to transform your program +before compilation. It is called a macro processor because it allows +you to define @dfn{macros}, which are brief abbreviations for longer +constructs. + +The C preprocessor is intended to be used only with C, C++, and +Objective-C source code. In the past, it has been abused as a general +text processor. It will choke on input which does not obey C's lexical +rules. For example, apostrophes will be interpreted as the beginning of +character constants, and cause errors. Also, you cannot rely on it +preserving characteristics of the input which are not significant to +C-family languages. If a Makefile is preprocessed, all the hard tabs +will be removed, and the Makefile will not work. + +Having said that, you can often get away with using cpp on things which +are not C@. Other Algol-ish programming languages are often safe +(Ada, etc.) So is assembly, with caution. @option{-traditional-cpp} +mode preserves more white space, and is otherwise more permissive. Many +of the problems can be avoided by writing C or C++ style comments +instead of native language comments, and keeping macros simple. + +Wherever possible, you should use a preprocessor geared to the language +you are writing in. Modern versions of the GNU assembler have macro +facilities. Most high level programming languages have their own +conditional compilation and inclusion mechanism. If all else fails, +try a true general text processor, such as GNU M4. + +C preprocessors vary in some details. This manual discusses the GNU C +preprocessor, which provides a small superset of the features of ISO +Standard C@. In its default mode, the GNU C preprocessor does not do a +few things required by the standard. These are features which are +rarely, if ever, used, and may cause surprising changes to the meaning +of a program which does not expect them. To get strict ISO Standard C, +you should use the @option{-std=c90}, @option{-std=c99}, +@option{-std=c11} or @option{-std=c17} options, depending +on which version of the standard you want. To get all the mandatory +diagnostics, you must also use @option{-pedantic}. @xref{Invocation}. + +This manual describes the behavior of the ISO preprocessor. To +minimize gratuitous differences, where the ISO preprocessor's +behavior does not conflict with traditional semantics, the +traditional preprocessor should behave the same way. The various +differences that do exist are detailed in the section @ref{Traditional +Mode}. + +For clarity, unless noted otherwise, references to @samp{CPP} in this +manual refer to GNU CPP@. +@c man end + +@menu +* Character sets:: +* Initial processing:: +* Tokenization:: +* The preprocessing language:: +@end menu + +@node Character sets +@section Character sets + +Source code character set processing in C and related languages is +rather complicated. The C standard discusses two character sets, but +there are really at least four. + +The files input to CPP might be in any character set at all. CPP's +very first action, before it even looks for line boundaries, is to +convert the file into the character set it uses for internal +processing. That set is what the C standard calls the @dfn{source} +character set. It must be isomorphic with ISO 10646, also known as +Unicode. CPP uses the UTF-8 encoding of Unicode. + +The character sets of the input files are specified using the +@option{-finput-charset=} option. + +All preprocessing work (the subject of the rest of this manual) is +carried out in the source character set. If you request textual +output from the preprocessor with the @option{-E} option, it will be +in UTF-8. + +After preprocessing is complete, string and character constants are +converted again, into the @dfn{execution} character set. This +character set is under control of the user; the default is UTF-8, +matching the source character set. Wide string and character +constants have their own character set, which is not called out +specifically in the standard. Again, it is under control of the user. +The default is UTF-16 or UTF-32, whichever fits in the target's +@code{wchar_t} type, in the target machine's byte +order.@footnote{UTF-16 does not meet the requirements of the C +standard for a wide character set, but the choice of 16-bit +@code{wchar_t} is enshrined in some system ABIs so we cannot fix +this.} Octal and hexadecimal escape sequences do not undergo +conversion; @t{'\x12'} has the value 0x12 regardless of the currently +selected execution character set. All other escapes are replaced by +the character in the source character set that they represent, then +converted to the execution character set, just like unescaped +characters. + +In identifiers, characters outside the ASCII range can be specified +with the @samp{\u} and @samp{\U} escapes or used directly in the input +encoding. If strict ISO C90 conformance is specified with an option +such as @option{-std=c90}, or @option{-fno-extended-identifiers} is +used, then those constructs are not permitted in identifiers. + +@node Initial processing +@section Initial processing + +The preprocessor performs a series of textual transformations on its +input. These happen before all other processing. Conceptually, they +happen in a rigid order, and the entire file is run through each +transformation before the next one begins. CPP actually does them +all at once, for performance reasons. These transformations correspond +roughly to the first three ``phases of translation'' described in the C +standard. + +@enumerate +@item +@cindex line endings +The input file is read into memory and broken into lines. + +Different systems use different conventions to indicate the end of a +line. GCC accepts the ASCII control sequences @kbd{LF}, @kbd{@w{CR +LF}} and @kbd{CR} as end-of-line markers. These are the canonical +sequences used by Unix, DOS and VMS, and the classic Mac OS (before +OSX) respectively. You may therefore safely copy source code written +on any of those systems to a different one and use it without +conversion. (GCC may lose track of the current line number if a file +doesn't consistently use one convention, as sometimes happens when it +is edited on computers with different conventions that share a network +file system.) + +If the last line of any input file lacks an end-of-line marker, the end +of the file is considered to implicitly supply one. The C standard says +that this condition provokes undefined behavior, so GCC will emit a +warning message. + +@item +@cindex trigraphs +@anchor{trigraphs}If trigraphs are enabled, they are replaced by their +corresponding single characters. By default GCC ignores trigraphs, +but if you request a strictly conforming mode with the @option{-std} +option, or you specify the @option{-trigraphs} option, then it +converts them. + +These are nine three-character sequences, all starting with @samp{??}, +that are defined by ISO C to stand for single characters. They permit +obsolete systems that lack some of C's punctuation to use C@. For +example, @samp{??/} stands for @samp{\}, so @t{'??/n'} is a character +constant for a newline. + +Trigraphs are not popular and many compilers implement them +incorrectly. Portable code should not rely on trigraphs being either +converted or ignored. With @option{-Wtrigraphs} GCC will warn you +when a trigraph may change the meaning of your program if it were +converted. @xref{Wtrigraphs}. + +In a string constant, you can prevent a sequence of question marks +from being confused with a trigraph by inserting a backslash between +the question marks, or by separating the string literal at the +trigraph and making use of string literal concatenation. @t{"(??\?)"} +is the string @samp{(???)}, not @samp{(?]}. Traditional C compilers +do not recognize these idioms. + +The nine trigraphs and their replacements are + +@smallexample +Trigraph: ??( ??) ??< ??> ??= ??/ ??' ??! ??- +Replacement: [ ] @{ @} # \ ^ | ~ +@end smallexample + +@item +@cindex continued lines +@cindex backslash-newline +Continued lines are merged into one long line. + +A continued line is a line which ends with a backslash, @samp{\}. The +backslash is removed and the following line is joined with the current +one. No space is inserted, so you may split a line anywhere, even in +the middle of a word. (It is generally more readable to split lines +only at white space.) + +The trailing backslash on a continued line is commonly referred to as a +@dfn{backslash-newline}. + +If there is white space between a backslash and the end of a line, that +is still a continued line. However, as this is usually the result of an +editing mistake, and many compilers will not accept it as a continued +line, GCC will warn you about it. + +@item +@cindex comments +@cindex line comments +@cindex block comments +All comments are replaced with single spaces. + +There are two kinds of comments. @dfn{Block comments} begin with +@samp{/*} and continue until the next @samp{*/}. Block comments do not +nest: + +@smallexample +/* @r{this is} /* @r{one comment} */ @r{text outside comment} +@end smallexample + +@dfn{Line comments} begin with @samp{//} and continue to the end of the +current line. Line comments do not nest either, but it does not matter, +because they would end in the same place anyway. + +@smallexample +// @r{this is} // @r{one comment} +@r{text outside comment} +@end smallexample +@end enumerate + +It is safe to put line comments inside block comments, or vice versa. + +@smallexample +@group +/* @r{block comment} + // @r{contains line comment} + @r{yet more comment} + */ @r{outside comment} + +// @r{line comment} /* @r{contains block comment} */ +@end group +@end smallexample + +But beware of commenting out one end of a block comment with a line +comment. + +@smallexample +@group + // @r{l.c.} /* @r{block comment begins} + @r{oops! this isn't a comment anymore} */ +@end group +@end smallexample + +Comments are not recognized within string literals. +@t{@w{"/* blah */"}} is the string constant @samp{@w{/* blah */}}, not +an empty string. + +Line comments are not in the 1989 edition of the C standard, but they +are recognized by GCC as an extension. In C++ and in the 1999 edition +of the C standard, they are an official part of the language. + +Since these transformations happen before all other processing, you can +split a line mechanically with backslash-newline anywhere. You can +comment out the end of a line. You can continue a line comment onto the +next line with backslash-newline. You can even split @samp{/*}, +@samp{*/}, and @samp{//} onto multiple lines with backslash-newline. +For example: + +@smallexample +@group +/\ +* +*/ # /* +*/ defi\ +ne FO\ +O 10\ +20 +@end group +@end smallexample + +@noindent +is equivalent to @code{@w{#define FOO 1020}}. All these tricks are +extremely confusing and should not be used in code intended to be +readable. + +There is no way to prevent a backslash at the end of a line from being +interpreted as a backslash-newline. This cannot affect any correct +program, however. + +@node Tokenization +@section Tokenization + +@cindex tokens +@cindex preprocessing tokens +After the textual transformations are finished, the input file is +converted into a sequence of @dfn{preprocessing tokens}. These mostly +correspond to the syntactic tokens used by the C compiler, but there are +a few differences. White space separates tokens; it is not itself a +token of any kind. Tokens do not have to be separated by white space, +but it is often necessary to avoid ambiguities. + +When faced with a sequence of characters that has more than one possible +tokenization, the preprocessor is greedy. It always makes each token, +starting from the left, as big as possible before moving on to the next +token. For instance, @code{a+++++b} is interpreted as +@code{@w{a ++ ++ + b}}, not as @code{@w{a ++ + ++ b}}, even though the +latter tokenization could be part of a valid C program and the former +could not. + +Once the input file is broken into tokens, the token boundaries never +change, except when the @samp{##} preprocessing operator is used to paste +tokens together. @xref{Concatenation}. For example, + +@smallexample +@group +#define foo() bar +foo()baz + @expansion{} bar baz +@emph{not} + @expansion{} barbaz +@end group +@end smallexample + +The compiler does not re-tokenize the preprocessor's output. Each +preprocessing token becomes one compiler token. + +@cindex identifiers +Preprocessing tokens fall into five broad classes: identifiers, +preprocessing numbers, string literals, punctuators, and other. An +@dfn{identifier} is the same as an identifier in C: any sequence of +letters, digits, or underscores, which begins with a letter or +underscore. Keywords of C have no significance to the preprocessor; +they are ordinary identifiers. You can define a macro whose name is a +keyword, for instance. The only identifier which can be considered a +preprocessing keyword is @code{defined}. @xref{Defined}. + +This is mostly true of other languages which use the C preprocessor. +However, a few of the keywords of C++ are significant even in the +preprocessor. @xref{C++ Named Operators}. + +In the 1999 C standard, identifiers may contain letters which are not +part of the ``basic source character set'', at the implementation's +discretion (such as accented Latin letters, Greek letters, or Chinese +ideograms). This may be done with an extended character set, or the +@samp{\u} and @samp{\U} escape sequences. + +As an extension, GCC treats @samp{$} as a letter. This is for +compatibility with some systems, such as VMS, where @samp{$} is commonly +used in system-defined function and object names. @samp{$} is not a +letter in strictly conforming mode, or if you specify the @option{-$} +option. @xref{Invocation}. + +@cindex numbers +@cindex preprocessing numbers +A @dfn{preprocessing number} has a rather bizarre definition. The +category includes all the normal integer and floating point constants +one expects of C, but also a number of other things one might not +initially recognize as a number. Formally, preprocessing numbers begin +with an optional period, a required decimal digit, and then continue +with any sequence of letters, digits, underscores, periods, and +exponents. Exponents are the two-character sequences @samp{e+}, +@samp{e-}, @samp{E+}, @samp{E-}, @samp{p+}, @samp{p-}, @samp{P+}, and +@samp{P-}. (The exponents that begin with @samp{p} or @samp{P} are +used for hexadecimal floating-point constants.) + +The purpose of this unusual definition is to isolate the preprocessor +from the full complexity of numeric constants. It does not have to +distinguish between lexically valid and invalid floating-point numbers, +which is complicated. The definition also permits you to split an +identifier at any position and get exactly two tokens, which can then be +pasted back together with the @samp{##} operator. + +It's possible for preprocessing numbers to cause programs to be +misinterpreted. For example, @code{0xE+12} is a preprocessing number +which does not translate to any valid numeric constant, therefore a +syntax error. It does not mean @code{@w{0xE + 12}}, which is what you +might have intended. + +@cindex string literals +@cindex string constants +@cindex character constants +@cindex header file names +@c the @: prevents makeinfo from turning '' into ". +@dfn{String literals} are string constants, character constants, and +header file names (the argument of @samp{#include}).@footnote{The C +standard uses the term @dfn{string literal} to refer only to what we are +calling @dfn{string constants}.} String constants and character +constants are straightforward: @t{"@dots{}"} or @t{'@dots{}'}. In +either case embedded quotes should be escaped with a backslash: +@t{'\'@:'} is the character constant for @samp{'}. There is no limit on +the length of a character constant, but the value of a character +constant that contains more than one character is +implementation-defined. @xref{Implementation Details}. + +Header file names either look like string constants, @t{"@dots{}"}, or are +written with angle brackets instead, @t{<@dots{}>}. In either case, +backslash is an ordinary character. There is no way to escape the +closing quote or angle bracket. The preprocessor looks for the header +file in different places depending on which form you use. @xref{Include +Operation}. + +No string literal may extend past the end of a line. You may use continued +lines instead, or string constant concatenation. + +@cindex punctuators +@cindex digraphs +@cindex alternative tokens +@dfn{Punctuators} are all the usual bits of punctuation which are +meaningful to C and C++. All but three of the punctuation characters in +ASCII are C punctuators. The exceptions are @samp{@@}, @samp{$}, and +@samp{`}. In addition, all the two- and three-character operators are +punctuators. There are also six @dfn{digraphs}, which the C++ standard +calls @dfn{alternative tokens}, which are merely alternate ways to spell +other punctuators. This is a second attempt to work around missing +punctuation in obsolete systems. It has no negative side effects, +unlike trigraphs, but does not cover as much ground. The digraphs and +their corresponding normal punctuators are: + +@smallexample +Digraph: <% %> <: :> %: %:%: +Punctuator: @{ @} [ ] # ## +@end smallexample + +@cindex other tokens +Any other single byte is considered ``other'' and passed on to the +preprocessor's output unchanged. The C compiler will almost certainly +reject source code containing ``other'' tokens. In ASCII, the only +``other'' characters are @samp{@@}, @samp{$}, @samp{`}, and control +characters other than NUL (all bits zero). (Note that @samp{$} is +normally considered a letter.) All bytes with the high bit set +(numeric range 0x7F--0xFF) that were not succesfully interpreted as +part of an extended character in the input encoding are also ``other'' +in the present implementation. + +NUL is a special case because of the high probability that its +appearance is accidental, and because it may be invisible to the user +(many terminals do not display NUL at all). Within comments, NULs are +silently ignored, just as any other character would be. In running +text, NUL is considered white space. For example, these two directives +have the same meaning. + +@smallexample +#define X^@@1 +#define X 1 +@end smallexample + +@noindent +(where @samp{^@@} is ASCII NUL)@. Within string or character constants, +NULs are preserved. In the latter two cases the preprocessor emits a +warning message. + +@node The preprocessing language +@section The preprocessing language +@cindex directives +@cindex preprocessing directives +@cindex directive line +@cindex directive name + +After tokenization, the stream of tokens may simply be passed straight +to the compiler's parser. However, if it contains any operations in the +@dfn{preprocessing language}, it will be transformed first. This stage +corresponds roughly to the standard's ``translation phase 4'' and is +what most people think of as the preprocessor's job. + +The preprocessing language consists of @dfn{directives} to be executed +and @dfn{macros} to be expanded. Its primary capabilities are: + +@itemize @bullet +@item +Inclusion of header files. These are files of declarations that can be +substituted into your program. + +@item +Macro expansion. You can define @dfn{macros}, which are abbreviations +for arbitrary fragments of C code. The preprocessor will replace the +macros with their definitions throughout the program. Some macros are +automatically defined for you. + +@item +Conditional compilation. You can include or exclude parts of the +program according to various conditions. + +@item +Line control. If you use a program to combine or rearrange source files +into an intermediate file which is then compiled, you can use line +control to inform the compiler where each source line originally came +from. + +@item +Diagnostics. You can detect problems at compile time and issue errors +or warnings. +@end itemize + +There are a few more, less useful, features. + +Except for expansion of predefined macros, all these operations are +triggered with @dfn{preprocessing directives}. Preprocessing directives +are lines in your program that start with @samp{#}. Whitespace is +allowed before and after the @samp{#}. The @samp{#} is followed by an +identifier, the @dfn{directive name}. It specifies the operation to +perform. Directives are commonly referred to as @samp{#@var{name}} +where @var{name} is the directive name. For example, @samp{#define} is +the directive that defines a macro. + +The @samp{#} which begins a directive cannot come from a macro +expansion. Also, the directive name is not macro expanded. Thus, if +@code{foo} is defined as a macro expanding to @code{define}, that does +not make @samp{#foo} a valid preprocessing directive. + +The set of valid directive names is fixed. Programs cannot define new +preprocessing directives. + +Some directives require arguments; these make up the rest of the +directive line and must be separated from the directive name by +whitespace. For example, @samp{#define} must be followed by a macro +name and the intended expansion of the macro. + +A preprocessing directive cannot cover more than one line. The line +may, however, be continued with backslash-newline, or by a block comment +which extends past the end of the line. In either case, when the +directive is processed, the continuations have already been merged with +the first line to make one long line. + +@node Header Files +@chapter Header Files + +@cindex header file +A header file is a file containing C declarations and macro definitions +(@pxref{Macros}) to be shared between several source files. You request +the use of a header file in your program by @dfn{including} it, with the +C preprocessing directive @samp{#include}. + +Header files serve two purposes. + +@itemize @bullet +@item +@cindex system header files +System header files declare the interfaces to parts of the operating +system. You include them in your program to supply the definitions and +declarations you need to invoke system calls and libraries. + +@item +Your own header files contain declarations for interfaces between the +source files of your program. Each time you have a group of related +declarations and macro definitions all or most of which are needed in +several different source files, it is a good idea to create a header +file for them. +@end itemize + +Including a header file produces the same results as copying the header +file into each source file that needs it. Such copying would be +time-consuming and error-prone. With a header file, the related +declarations appear in only one place. If they need to be changed, they +can be changed in one place, and programs that include the header file +will automatically use the new version when next recompiled. The header +file eliminates the labor of finding and changing all the copies as well +as the risk that a failure to find one copy will result in +inconsistencies within a program. + +In C, the usual convention is to give header files names that end with +@file{.h}. It is most portable to use only letters, digits, dashes, and +underscores in header file names, and at most one dot. + +@menu +* Include Syntax:: +* Include Operation:: +* Search Path:: +* Once-Only Headers:: +* Alternatives to Wrapper #ifndef:: +* Computed Includes:: +* Wrapper Headers:: +* System Headers:: +@end menu + +@node Include Syntax +@section Include Syntax + +@findex #include +Both user and system header files are included using the preprocessing +directive @samp{#include}. It has two variants: + +@table @code +@item #include <@var{file}> +This variant is used for system header files. It searches for a file +named @var{file} in a standard list of system directories. You can prepend +directories to this list with the @option{-I} option (@pxref{Invocation}). + +@item #include "@var{file}" +This variant is used for header files of your own program. It +searches for a file named @var{file} first in the directory containing +the current file, then in the quote directories and then the same +directories used for @code{<@var{file}>}. You can prepend directories +to the list of quote directories with the @option{-iquote} option. +@end table + +The argument of @samp{#include}, whether delimited with quote marks or +angle brackets, behaves like a string constant in that comments are not +recognized, and macro names are not expanded. Thus, @code{@w{#include +}} specifies inclusion of a system header file named @file{x/*y}. + +However, if backslashes occur within @var{file}, they are considered +ordinary text characters, not escape characters. None of the character +escape sequences appropriate to string constants in C are processed. +Thus, @code{@w{#include "x\n\\y"}} specifies a filename containing three +backslashes. (Some systems interpret @samp{\} as a pathname separator. +All of these also interpret @samp{/} the same way. It is most portable +to use only @samp{/}.) + +It is an error if there is anything (other than comments) on the line +after the file name. + +@node Include Operation +@section Include Operation + +The @samp{#include} directive works by directing the C preprocessor to +scan the specified file as input before continuing with the rest of the +current file. The output from the preprocessor contains the output +already generated, followed by the output resulting from the included +file, followed by the output that comes from the text after the +@samp{#include} directive. For example, if you have a header file +@file{header.h} as follows, + +@smallexample +char *test (void); +@end smallexample + +@noindent +and a main program called @file{program.c} that uses the header file, +like this, + +@smallexample +int x; +#include "header.h" + +int +main (void) +@{ + puts (test ()); +@} +@end smallexample + +@noindent +the compiler will see the same token stream as it would if +@file{program.c} read + +@smallexample +int x; +char *test (void); + +int +main (void) +@{ + puts (test ()); +@} +@end smallexample + +Included files are not limited to declarations and macro definitions; +those are merely the typical uses. Any fragment of a C program can be +included from another file. The include file could even contain the +beginning of a statement that is concluded in the containing file, or +the end of a statement that was started in the including file. However, +an included file must consist of complete tokens. Comments and string +literals which have not been closed by the end of an included file are +invalid. For error recovery, they are considered to end at the end of +the file. + +To avoid confusion, it is best if header files contain only complete +syntactic units---function declarations or definitions, type +declarations, etc. + +The line following the @samp{#include} directive is always treated as a +separate line by the C preprocessor, even if the included file lacks a +final newline. + +@node Search Path +@section Search Path + +By default, the preprocessor looks for header files included by the quote +form of the directive @code{@w{#include "@var{file}"}} first relative to +the directory of the current file, and then in a preconfigured list +of standard system directories. +For example, if @file{/usr/include/sys/stat.h} contains +@code{@w{#include "types.h"}}, GCC looks for @file{types.h} first in +@file{/usr/include/sys}, then in its usual search path. + +For the angle-bracket form @code{@w{#include <@var{file}>}}, the +preprocessor's default behavior is to look only in the standard system +directories. The exact search directory list depends on the target +system, how GCC is configured, and where it is installed. You can +find the default search directory list for your version of CPP by +invoking it with the @option{-v} option. For example, + +@smallexample +cpp -v /dev/null -o /dev/null +@end smallexample + +There are a number of command-line options you can use to add +additional directories to the search path. +The most commonly-used option is @option{-I@var{dir}}, which causes +@var{dir} to be searched after the current directory (for the quote +form of the directive) and ahead of the standard system directories. +You can specify multiple @option{-I} options on the command line, +in which case the directories are searched in left-to-right order. + +If you need separate control over the search paths for the quote and +angle-bracket forms of the @samp{#include} directive, you can use the +@option{-iquote} and/or @option{-isystem} options instead of @option{-I}. +@xref{Invocation}, for a detailed description of these options, as +well as others that are less generally useful. + +If you specify other options on the command line, such as @option{-I}, +that affect where the preprocessor searches for header files, the +directory list printed by the @option{-v} option reflects the actual +search path used by the preprocessor. + +Note that you can also prevent the preprocessor from searching any of +the default system header directories with the @option{-nostdinc} +option. This is useful when you are compiling an operating system +kernel or some other program that does not use the standard C library +facilities, or the standard C library itself. + +@node Once-Only Headers +@section Once-Only Headers +@cindex repeated inclusion +@cindex including just once +@cindex wrapper @code{#ifndef} + +If a header file happens to be included twice, the compiler will process +its contents twice. This is very likely to cause an error, e.g.@: when the +compiler sees the same structure definition twice. Even if it does not, +it will certainly waste time. + +The standard way to prevent this is to enclose the entire real contents +of the file in a conditional, like this: + +@smallexample +@group +/* File foo. */ +#ifndef FILE_FOO_SEEN +#define FILE_FOO_SEEN + +@var{the entire file} + +#endif /* !FILE_FOO_SEEN */ +@end group +@end smallexample + +This construct is commonly known as a @dfn{wrapper #ifndef}. +When the header is included again, the conditional will be false, +because @code{FILE_FOO_SEEN} is defined. The preprocessor will skip +over the entire contents of the file, and the compiler will not see it +twice. + +CPP optimizes even further. It remembers when a header file has a +wrapper @samp{#ifndef}. If a subsequent @samp{#include} specifies that +header, and the macro in the @samp{#ifndef} is still defined, it does +not bother to rescan the file at all. + +You can put comments outside the wrapper. They will not interfere with +this optimization. + +@cindex controlling macro +@cindex guard macro +The macro @code{FILE_FOO_SEEN} is called the @dfn{controlling macro} or +@dfn{guard macro}. In a user header file, the macro name should not +begin with @samp{_}. In a system header file, it should begin with +@samp{__} to avoid conflicts with user programs. In any kind of header +file, the macro name should contain the name of the file and some +additional text, to avoid conflicts with other header files. + +@node Alternatives to Wrapper #ifndef +@section Alternatives to Wrapper #ifndef + +CPP supports two more ways of indicating that a header file should be +read only once. Neither one is as portable as a wrapper @samp{#ifndef} +and we recommend you do not use them in new programs, with the caveat +that @samp{#import} is standard practice in Objective-C. + +@findex #import +CPP supports a variant of @samp{#include} called @samp{#import} which +includes a file, but does so at most once. If you use @samp{#import} +instead of @samp{#include}, then you don't need the conditionals +inside the header file to prevent multiple inclusion of the contents. +@samp{#import} is standard in Objective-C, but is considered a +deprecated extension in C and C++. + +@samp{#import} is not a well designed feature. It requires the users of +a header file to know that it should only be included once. It is much +better for the header file's implementor to write the file so that users +don't need to know this. Using a wrapper @samp{#ifndef} accomplishes +this goal. + +In the present implementation, a single use of @samp{#import} will +prevent the file from ever being read again, by either @samp{#import} or +@samp{#include}. You should not rely on this; do not use both +@samp{#import} and @samp{#include} to refer to the same header file. + +Another way to prevent a header file from being included more than once +is with the @samp{#pragma once} directive (@pxref{Pragmas}). +@samp{#pragma once} does not have the problems that @samp{#import} does, +but it is not recognized by all preprocessors, so you cannot rely on it +in a portable program. + +@node Computed Includes +@section Computed Includes +@cindex computed includes +@cindex macros in include + +Sometimes it is necessary to select one of several different header +files to be included into your program. They might specify +configuration parameters to be used on different sorts of operating +systems, for instance. You could do this with a series of conditionals, + +@smallexample +#if SYSTEM_1 +# include "system_1.h" +#elif SYSTEM_2 +# include "system_2.h" +#elif SYSTEM_3 +@dots{} +#endif +@end smallexample + +That rapidly becomes tedious. Instead, the preprocessor offers the +ability to use a macro for the header name. This is called a +@dfn{computed include}. Instead of writing a header name as the direct +argument of @samp{#include}, you simply put a macro name there instead: + +@smallexample +#define SYSTEM_H "system_1.h" +@dots{} +#include SYSTEM_H +@end smallexample + +@noindent +@code{SYSTEM_H} will be expanded, and the preprocessor will look for +@file{system_1.h} as if the @samp{#include} had been written that way +originally. @code{SYSTEM_H} could be defined by your Makefile with a +@option{-D} option. + +You must be careful when you define the macro. @samp{#define} saves +tokens, not text. The preprocessor has no way of knowing that the macro +will be used as the argument of @samp{#include}, so it generates +ordinary tokens, not a header name. This is unlikely to cause problems +if you use double-quote includes, which are close enough to string +constants. If you use angle brackets, however, you may have trouble. + +The syntax of a computed include is actually a bit more general than the +above. If the first non-whitespace character after @samp{#include} is +not @samp{"} or @samp{<}, then the entire line is macro-expanded +like running text would be. + +If the line expands to a single string constant, the contents of that +string constant are the file to be included. CPP does not re-examine the +string for embedded quotes, but neither does it process backslash +escapes in the string. Therefore + +@smallexample +#define HEADER "a\"b" +#include HEADER +@end smallexample + +@noindent +looks for a file named @file{a\"b}. CPP searches for the file according +to the rules for double-quoted includes. + +If the line expands to a token stream beginning with a @samp{<} token +and including a @samp{>} token, then the tokens between the @samp{<} and +the first @samp{>} are combined to form the filename to be included. +Any whitespace between tokens is reduced to a single space; then any +space after the initial @samp{<} is retained, but a trailing space +before the closing @samp{>} is ignored. CPP searches for the file +according to the rules for angle-bracket includes. + +In either case, if there are any tokens on the line after the file name, +an error occurs and the directive is not processed. It is also an error +if the result of expansion does not match either of the two expected +forms. + +These rules are implementation-defined behavior according to the C +standard. To minimize the risk of different compilers interpreting your +computed includes differently, we recommend you use only a single +object-like macro which expands to a string constant. This will also +minimize confusion for people reading your program. + +@node Wrapper Headers +@section Wrapper Headers +@cindex wrapper headers +@cindex overriding a header file +@findex #include_next + +Sometimes it is necessary to adjust the contents of a system-provided +header file without editing it directly. GCC's @command{fixincludes} +operation does this, for example. One way to do that would be to create +a new header file with the same name and insert it in the search path +before the original header. That works fine as long as you're willing +to replace the old header entirely. But what if you want to refer to +the old header from the new one? + +You cannot simply include the old header with @samp{#include}. That +will start from the beginning, and find your new header again. If your +header is not protected from multiple inclusion (@pxref{Once-Only +Headers}), it will recurse infinitely and cause a fatal error. + +You could include the old header with an absolute pathname: +@smallexample +#include "/usr/include/old-header.h" +@end smallexample +@noindent +This works, but is not clean; should the system headers ever move, you +would have to edit the new headers to match. + +There is no way to solve this problem within the C standard, but you can +use the GNU extension @samp{#include_next}. It means, ``Include the +@emph{next} file with this name''. This directive works like +@samp{#include} except in searching for the specified file: it starts +searching the list of header file directories @emph{after} the directory +in which the current file was found. + +Suppose you specify @option{-I /usr/local/include}, and the list of +directories to search also includes @file{/usr/include}; and suppose +both directories contain @file{signal.h}. Ordinary @code{@w{#include +}} finds the file under @file{/usr/local/include}. If that +file contains @code{@w{#include_next }}, it starts searching +after that directory, and finds the file in @file{/usr/include}. + +@samp{#include_next} does not distinguish between @code{<@var{file}>} +and @code{"@var{file}"} inclusion, nor does it check that the file you +specify has the same name as the current file. It simply looks for the +file named, starting with the directory in the search path after the one +where the current file was found. + +The use of @samp{#include_next} can lead to great confusion. We +recommend it be used only when there is no other alternative. In +particular, it should not be used in the headers belonging to a specific +program; it should be used only to make global corrections along the +lines of @command{fixincludes}. + +@node System Headers +@section System Headers +@cindex system header files + +The header files declaring interfaces to the operating system and +runtime libraries often cannot be written in strictly conforming C@. +Therefore, GCC gives code found in @dfn{system headers} special +treatment. All warnings, other than those generated by @samp{#warning} +(@pxref{Diagnostics}), are suppressed while GCC is processing a system +header. Macros defined in a system header are immune to a few warnings +wherever they are expanded. This immunity is granted on an ad-hoc +basis, when we find that a warning generates lots of false positives +because of code in macros defined in system headers. + +Normally, only the headers found in specific directories are considered +system headers. These directories are determined when GCC is compiled. +There are, however, two ways to make normal headers into system headers: + +@itemize @bullet +@item +Header files found in directories added to the search path with the +@option{-isystem} and @option{-idirafter} command-line options are +treated as system headers for the purposes of diagnostics. + +@item +@findex #pragma GCC system_header +There is also a directive, @code{@w{#pragma GCC system_header}}, which +tells GCC to consider the rest of the current include file a system +header, no matter where it was found. Code that comes before the +@samp{#pragma} in the file is not affected. @code{@w{#pragma GCC +system_header}} has no effect in the primary source file. +@end itemize + +On some targets, such as RS/6000 AIX, GCC implicitly surrounds all +system headers with an @samp{extern "C"} block when compiling as C++. + +@node Macros +@chapter Macros + +A @dfn{macro} is a fragment of code which has been given a name. +Whenever the name is used, it is replaced by the contents of the macro. +There are two kinds of macros. They differ mostly in what they look +like when they are used. @dfn{Object-like} macros resemble data objects +when used, @dfn{function-like} macros resemble function calls. + +You may define any valid identifier as a macro, even if it is a C +keyword. The preprocessor does not know anything about keywords. This +can be useful if you wish to hide a keyword such as @code{const} from an +older compiler that does not understand it. However, the preprocessor +operator @code{defined} (@pxref{Defined}) can never be defined as a +macro, and C++'s named operators (@pxref{C++ Named Operators}) cannot be +macros when you are compiling C++. + +@menu +* Object-like Macros:: +* Function-like Macros:: +* Macro Arguments:: +* Stringizing:: +* Concatenation:: +* Variadic Macros:: +* Predefined Macros:: +* Undefining and Redefining Macros:: +* Directives Within Macro Arguments:: +* Macro Pitfalls:: +@end menu + +@node Object-like Macros +@section Object-like Macros +@cindex object-like macro +@cindex symbolic constants +@cindex manifest constants + +An @dfn{object-like macro} is a simple identifier which will be replaced +by a code fragment. It is called object-like because it looks like a +data object in code that uses it. They are most commonly used to give +symbolic names to numeric constants. + +@findex #define +You create macros with the @samp{#define} directive. @samp{#define} is +followed by the name of the macro and then the token sequence it should +be an abbreviation for, which is variously referred to as the macro's +@dfn{body}, @dfn{expansion} or @dfn{replacement list}. For example, + +@smallexample +#define BUFFER_SIZE 1024 +@end smallexample + +@noindent +defines a macro named @code{BUFFER_SIZE} as an abbreviation for the +token @code{1024}. If somewhere after this @samp{#define} directive +there comes a C statement of the form + +@smallexample +foo = (char *) malloc (BUFFER_SIZE); +@end smallexample + +@noindent +then the C preprocessor will recognize and @dfn{expand} the macro +@code{BUFFER_SIZE}. The C compiler will see the same tokens as it would +if you had written + +@smallexample +foo = (char *) malloc (1024); +@end smallexample + +By convention, macro names are written in uppercase. Programs are +easier to read when it is possible to tell at a glance which names are +macros. + +The macro's body ends at the end of the @samp{#define} line. You may +continue the definition onto multiple lines, if necessary, using +backslash-newline. When the macro is expanded, however, it will all +come out on one line. For example, + +@smallexample +#define NUMBERS 1, \ + 2, \ + 3 +int x[] = @{ NUMBERS @}; + @expansion{} int x[] = @{ 1, 2, 3 @}; +@end smallexample + +@noindent +The most common visible consequence of this is surprising line numbers +in error messages. + +There is no restriction on what can go in a macro body provided it +decomposes into valid preprocessing tokens. Parentheses need not +balance, and the body need not resemble valid C code. (If it does not, +you may get error messages from the C compiler when you use the macro.) + +The C preprocessor scans your program sequentially. Macro definitions +take effect at the place you write them. Therefore, the following input +to the C preprocessor + +@smallexample +foo = X; +#define X 4 +bar = X; +@end smallexample + +@noindent +produces + +@smallexample +foo = X; +bar = 4; +@end smallexample + +When the preprocessor expands a macro name, the macro's expansion +replaces the macro invocation, then the expansion is examined for more +macros to expand. For example, + +@smallexample +@group +#define TABLESIZE BUFSIZE +#define BUFSIZE 1024 +TABLESIZE + @expansion{} BUFSIZE + @expansion{} 1024 +@end group +@end smallexample + +@noindent +@code{TABLESIZE} is expanded first to produce @code{BUFSIZE}, then that +macro is expanded to produce the final result, @code{1024}. + +Notice that @code{BUFSIZE} was not defined when @code{TABLESIZE} was +defined. The @samp{#define} for @code{TABLESIZE} uses exactly the +expansion you specify---in this case, @code{BUFSIZE}---and does not +check to see whether it too contains macro names. Only when you +@emph{use} @code{TABLESIZE} is the result of its expansion scanned for +more macro names. + +This makes a difference if you change the definition of @code{BUFSIZE} +at some point in the source file. @code{TABLESIZE}, defined as shown, +will always expand using the definition of @code{BUFSIZE} that is +currently in effect: + +@smallexample +#define BUFSIZE 1020 +#define TABLESIZE BUFSIZE +#undef BUFSIZE +#define BUFSIZE 37 +@end smallexample + +@noindent +Now @code{TABLESIZE} expands (in two stages) to @code{37}. + +If the expansion of a macro contains its own name, either directly or +via intermediate macros, it is not expanded again when the expansion is +examined for more macros. This prevents infinite recursion. +@xref{Self-Referential Macros}, for the precise details. + +@node Function-like Macros +@section Function-like Macros +@cindex function-like macros + +You can also define macros whose use looks like a function call. These +are called @dfn{function-like macros}. To define a function-like macro, +you use the same @samp{#define} directive, but you put a pair of +parentheses immediately after the macro name. For example, + +@smallexample +#define lang_init() c_init() +lang_init() + @expansion{} c_init() +@end smallexample + +A function-like macro is only expanded if its name appears with a pair +of parentheses after it. If you write just the name, it is left alone. +This can be useful when you have a function and a macro of the same +name, and you wish to use the function sometimes. + +@smallexample +extern void foo(void); +#define foo() /* @r{optimized inline version} */ +@dots{} + foo(); + funcptr = foo; +@end smallexample + +Here the call to @code{foo()} will use the macro, but the function +pointer will get the address of the real function. If the macro were to +be expanded, it would cause a syntax error. + +If you put spaces between the macro name and the parentheses in the +macro definition, that does not define a function-like macro, it defines +an object-like macro whose expansion happens to begin with a pair of +parentheses. + +@smallexample +#define lang_init () c_init() +lang_init() + @expansion{} () c_init()() +@end smallexample + +The first two pairs of parentheses in this expansion come from the +macro. The third is the pair that was originally after the macro +invocation. Since @code{lang_init} is an object-like macro, it does not +consume those parentheses. + +@node Macro Arguments +@section Macro Arguments +@cindex arguments +@cindex macros with arguments +@cindex arguments in macro definitions + +Function-like macros can take @dfn{arguments}, just like true functions. +To define a macro that uses arguments, you insert @dfn{parameters} +between the pair of parentheses in the macro definition that make the +macro function-like. The parameters must be valid C identifiers, +separated by commas and optionally whitespace. + +To invoke a macro that takes arguments, you write the name of the macro +followed by a list of @dfn{actual arguments} in parentheses, separated +by commas. The invocation of the macro need not be restricted to a +single logical line---it can cross as many lines in the source file as +you wish. The number of arguments you give must match the number of +parameters in the macro definition. When the macro is expanded, each +use of a parameter in its body is replaced by the tokens of the +corresponding argument. (You need not use all of the parameters in the +macro body.) + +As an example, here is a macro that computes the minimum of two numeric +values, as it is defined in many C programs, and some uses. + +@smallexample +#define min(X, Y) ((X) < (Y) ? (X) : (Y)) + x = min(a, b); @expansion{} x = ((a) < (b) ? (a) : (b)); + y = min(1, 2); @expansion{} y = ((1) < (2) ? (1) : (2)); + z = min(a + 28, *p); @expansion{} z = ((a + 28) < (*p) ? (a + 28) : (*p)); +@end smallexample + +@noindent +(In this small example you can already see several of the dangers of +macro arguments. @xref{Macro Pitfalls}, for detailed explanations.) + +Leading and trailing whitespace in each argument is dropped, and all +whitespace between the tokens of an argument is reduced to a single +space. Parentheses within each argument must balance; a comma within +such parentheses does not end the argument. However, there is no +requirement for square brackets or braces to balance, and they do not +prevent a comma from separating arguments. Thus, + +@smallexample +macro (array[x = y, x + 1]) +@end smallexample + +@noindent +passes two arguments to @code{macro}: @code{array[x = y} and @code{x + +1]}. If you want to supply @code{array[x = y, x + 1]} as an argument, +you can write it as @code{array[(x = y, x + 1)]}, which is equivalent C +code. + +All arguments to a macro are completely macro-expanded before they are +substituted into the macro body. After substitution, the complete text +is scanned again for macros to expand, including the arguments. This rule +may seem strange, but it is carefully designed so you need not worry +about whether any function call is actually a macro invocation. You can +run into trouble if you try to be too clever, though. @xref{Argument +Prescan}, for detailed discussion. + +For example, @code{min (min (a, b), c)} is first expanded to + +@smallexample + min (((a) < (b) ? (a) : (b)), (c)) +@end smallexample + +@noindent +and then to + +@smallexample +@group +((((a) < (b) ? (a) : (b))) < (c) + ? (((a) < (b) ? (a) : (b))) + : (c)) +@end group +@end smallexample + +@noindent +(Line breaks shown here for clarity would not actually be generated.) + +@cindex empty macro arguments +You can leave macro arguments empty; this is not an error to the +preprocessor (but many macros will then expand to invalid code). +You cannot leave out arguments entirely; if a macro takes two arguments, +there must be exactly one comma at the top level of its argument list. +Here are some silly examples using @code{min}: + +@smallexample +min(, b) @expansion{} (( ) < (b) ? ( ) : (b)) +min(a, ) @expansion{} ((a ) < ( ) ? (a ) : ( )) +min(,) @expansion{} (( ) < ( ) ? ( ) : ( )) +min((,),) @expansion{} (((,)) < ( ) ? ((,)) : ( )) + +min() @error{} macro "min" requires 2 arguments, but only 1 given +min(,,) @error{} macro "min" passed 3 arguments, but takes just 2 +@end smallexample + +Whitespace is not a preprocessing token, so if a macro @code{foo} takes +one argument, @code{@w{foo ()}} and @code{@w{foo ( )}} both supply it an +empty argument. Previous GNU preprocessor implementations and +documentation were incorrect on this point, insisting that a +function-like macro that takes a single argument be passed a space if an +empty argument was required. + +Macro parameters appearing inside string literals are not replaced by +their corresponding actual arguments. + +@smallexample +#define foo(x) x, "x" +foo(bar) @expansion{} bar, "x" +@end smallexample + +@node Stringizing +@section Stringizing +@cindex stringizing +@cindex @samp{#} operator + +Sometimes you may want to convert a macro argument into a string +constant. Parameters are not replaced inside string constants, but you +can use the @samp{#} preprocessing operator instead. When a macro +parameter is used with a leading @samp{#}, the preprocessor replaces it +with the literal text of the actual argument, converted to a string +constant. Unlike normal parameter replacement, the argument is not +macro-expanded first. This is called @dfn{stringizing}. + +There is no way to combine an argument with surrounding text and +stringize it all together. Instead, you can write a series of adjacent +string constants and stringized arguments. The preprocessor +replaces the stringized arguments with string constants. The C +compiler then combines all the adjacent string constants into one +long string. + +Here is an example of a macro definition that uses stringizing: + +@smallexample +@group +#define WARN_IF(EXP) \ +do @{ if (EXP) \ + fprintf (stderr, "Warning: " #EXP "\n"); @} \ +while (0) +WARN_IF (x == 0); + @expansion{} do @{ if (x == 0) + fprintf (stderr, "Warning: " "x == 0" "\n"); @} while (0); +@end group +@end smallexample + +@noindent +The argument for @code{EXP} is substituted once, as-is, into the +@code{if} statement, and once, stringized, into the argument to +@code{fprintf}. If @code{x} were a macro, it would be expanded in the +@code{if} statement, but not in the string. + +The @code{do} and @code{while (0)} are a kludge to make it possible to +write @code{WARN_IF (@var{arg});}, which the resemblance of +@code{WARN_IF} to a function would make C programmers want to do; see +@ref{Swallowing the Semicolon}. + +Stringizing in C involves more than putting double-quote characters +around the fragment. The preprocessor backslash-escapes the quotes +surrounding embedded string constants, and all backslashes within string and +character constants, in order to get a valid C string constant with the +proper contents. Thus, stringizing @code{@w{p = "foo\n";}} results in +@t{@w{"p = \"foo\\n\";"}}. However, backslashes that are not inside string +or character constants are not duplicated: @samp{\n} by itself +stringizes to @t{"\n"}. + +All leading and trailing whitespace in text being stringized is +ignored. Any sequence of whitespace in the middle of the text is +converted to a single space in the stringized result. Comments are +replaced by whitespace long before stringizing happens, so they +never appear in stringized text. + +There is no way to convert a macro argument into a character constant. + +If you want to stringize the result of expansion of a macro argument, +you have to use two levels of macros. + +@smallexample +#define xstr(s) str(s) +#define str(s) #s +#define foo 4 +str (foo) + @expansion{} "foo" +xstr (foo) + @expansion{} xstr (4) + @expansion{} str (4) + @expansion{} "4" +@end smallexample + +@code{s} is stringized when it is used in @code{str}, so it is not +macro-expanded first. But @code{s} is an ordinary argument to +@code{xstr}, so it is completely macro-expanded before @code{xstr} +itself is expanded (@pxref{Argument Prescan}). Therefore, by the time +@code{str} gets to its argument, it has already been macro-expanded. + +@node Concatenation +@section Concatenation +@cindex concatenation +@cindex token pasting +@cindex token concatenation +@cindex @samp{##} operator + +It is often useful to merge two tokens into one while expanding macros. +This is called @dfn{token pasting} or @dfn{token concatenation}. The +@samp{##} preprocessing operator performs token pasting. When a macro +is expanded, the two tokens on either side of each @samp{##} operator +are combined into a single token, which then replaces the @samp{##} and +the two original tokens in the macro expansion. Usually both will be +identifiers, or one will be an identifier and the other a preprocessing +number. When pasted, they make a longer identifier. This isn't the +only valid case. It is also possible to concatenate two numbers (or a +number and a name, such as @code{1.5} and @code{e3}) into a number. +Also, multi-character operators such as @code{+=} can be formed by +token pasting. + +However, two tokens that don't together form a valid token cannot be +pasted together. For example, you cannot concatenate @code{x} with +@code{+} in either order. If you try, the preprocessor issues a warning +and emits the two tokens. Whether it puts white space between the +tokens is undefined. It is common to find unnecessary uses of @samp{##} +in complex macros. If you get this warning, it is likely that you can +simply remove the @samp{##}. + +Both the tokens combined by @samp{##} could come from the macro body, +but you could just as well write them as one token in the first place. +Token pasting is most useful when one or both of the tokens comes from a +macro argument. If either of the tokens next to an @samp{##} is a +parameter name, it is replaced by its actual argument before @samp{##} +executes. As with stringizing, the actual argument is not +macro-expanded first. If the argument is empty, that @samp{##} has no +effect. + +Keep in mind that the C preprocessor converts comments to whitespace +before macros are even considered. Therefore, you cannot create a +comment by concatenating @samp{/} and @samp{*}. You can put as much +whitespace between @samp{##} and its operands as you like, including +comments, and you can put comments in arguments that will be +concatenated. However, it is an error if @samp{##} appears at either +end of a macro body. + +Consider a C program that interprets named commands. There probably +needs to be a table of commands, perhaps an array of structures declared +as follows: + +@smallexample +@group +struct command +@{ + char *name; + void (*function) (void); +@}; +@end group + +@group +struct command commands[] = +@{ + @{ "quit", quit_command @}, + @{ "help", help_command @}, + @dots{} +@}; +@end group +@end smallexample + +It would be cleaner not to have to give each command name twice, once in +the string constant and once in the function name. A macro which takes the +name of a command as an argument can make this unnecessary. The string +constant can be created with stringizing, and the function name by +concatenating the argument with @samp{_command}. Here is how it is done: + +@smallexample +#define COMMAND(NAME) @{ #NAME, NAME ## _command @} + +struct command commands[] = +@{ + COMMAND (quit), + COMMAND (help), + @dots{} +@}; +@end smallexample + +@node Variadic Macros +@section Variadic Macros +@cindex variable number of arguments +@cindex macros with variable arguments +@cindex variadic macros + +A macro can be declared to accept a variable number of arguments much as +a function can. The syntax for defining the macro is similar to that of +a function. Here is an example: + +@smallexample +#define eprintf(...) fprintf (stderr, __VA_ARGS__) +@end smallexample + +This kind of macro is called @dfn{variadic}. When the macro is invoked, +all the tokens in its argument list after the last named argument (this +macro has none), including any commas, become the @dfn{variable +argument}. This sequence of tokens replaces the identifier +@code{@w{__VA_ARGS__}} in the macro body wherever it appears. Thus, we +have this expansion: + +@smallexample +eprintf ("%s:%d: ", input_file, lineno) + @expansion{} fprintf (stderr, "%s:%d: ", input_file, lineno) +@end smallexample + +The variable argument is completely macro-expanded before it is inserted +into the macro expansion, just like an ordinary argument. You may use +the @samp{#} and @samp{##} operators to stringize the variable argument +or to paste its leading or trailing token with another token. (But see +below for an important special case for @samp{##}.) + +If your macro is complicated, you may want a more descriptive name for +the variable argument than @code{@w{__VA_ARGS__}}. CPP permits +this, as an extension. You may write an argument name immediately +before the @samp{...}; that name is used for the variable argument. +The @code{eprintf} macro above could be written + +@smallexample +#define eprintf(args...) fprintf (stderr, args) +@end smallexample + +@noindent +using this extension. You cannot use @code{@w{__VA_ARGS__}} and this +extension in the same macro. + +You can have named arguments as well as variable arguments in a variadic +macro. We could define @code{eprintf} like this, instead: + +@smallexample +#define eprintf(format, ...) fprintf (stderr, format, __VA_ARGS__) +@end smallexample + +@noindent +This formulation looks more descriptive, but historically it was less +flexible: you had to supply at least one argument after the format +string. In standard C, you could not omit the comma separating the +named argument from the variable arguments. (Note that this +restriction has been lifted in C++20, and never existed in GNU C; see +below.) + +Furthermore, if you left the variable argument empty, you would have +gotten a syntax error, because there would have been an extra comma +after the format string. + +@smallexample +eprintf("success!\n", ); + @expansion{} fprintf(stderr, "success!\n", ); +@end smallexample + +This has been fixed in C++20, and GNU CPP also has a pair of +extensions which deal with this problem. + +First, in GNU CPP, and in C++ beginning in C++20, you are allowed to +leave the variable argument out entirely: + +@smallexample +eprintf ("success!\n") + @expansion{} fprintf(stderr, "success!\n", ); +@end smallexample + +@noindent +Second, C++20 introduces the @code{@w{__VA_OPT__}} function macro. +This macro may only appear in the definition of a variadic macro. If +the variable argument has any tokens, then a @code{@w{__VA_OPT__}} +invocation expands to its argument; but if the variable argument does +not have any tokens, the @code{@w{__VA_OPT__}} expands to nothing: + +@smallexample +#define eprintf(format, ...) \ + fprintf (stderr, format __VA_OPT__(,) __VA_ARGS__) +@end smallexample + +@code{@w{__VA_OPT__}} is also available in GNU C and GNU C++. + +Historically, GNU CPP has also had another extension to handle the +trailing comma: the @samp{##} token paste operator has a special +meaning when placed between a comma and a variable argument. Despite +the introduction of @code{@w{__VA_OPT__}}, this extension remains +supported in GNU CPP, for backward compatibility. If you write + +@smallexample +#define eprintf(format, ...) fprintf (stderr, format, ##__VA_ARGS__) +@end smallexample + +@noindent +and the variable argument is left out when the @code{eprintf} macro is +used, then the comma before the @samp{##} will be deleted. This does +@emph{not} happen if you pass an empty argument, nor does it happen if +the token preceding @samp{##} is anything other than a comma. + +@smallexample +eprintf ("success!\n") + @expansion{} fprintf(stderr, "success!\n"); +@end smallexample + +@noindent +The above explanation is ambiguous about the case where the only macro +parameter is a variable arguments parameter, as it is meaningless to +try to distinguish whether no argument at all is an empty argument or +a missing argument. +CPP retains the comma when conforming to a specific C +standard. Otherwise the comma is dropped as an extension to the standard. + +The C standard +mandates that the only place the identifier @code{@w{__VA_ARGS__}} +can appear is in the replacement list of a variadic macro. It may not +be used as a macro name, macro argument name, or within a different type +of macro. It may also be forbidden in open text; the standard is +ambiguous. We recommend you avoid using it except for its defined +purpose. + +Likewise, C++ forbids @code{@w{__VA_OPT__}} anywhere outside the +replacement list of a variadic macro. + +Variadic macros became a standard part of the C language with C99. +GNU CPP previously supported them +with a named variable argument +(@samp{args...}, not @samp{...} and @code{@w{__VA_ARGS__}}), which +is still supported for backward compatibility. + +@node Predefined Macros +@section Predefined Macros + +@cindex predefined macros +Several object-like macros are predefined; you use them without +supplying their definitions. They fall into three classes: standard, +common, and system-specific. + +In C++, there is a fourth category, the named operators. They act like +predefined macros, but you cannot undefine them. + +@menu +* Standard Predefined Macros:: +* Common Predefined Macros:: +* System-specific Predefined Macros:: +* C++ Named Operators:: +@end menu + +@node Standard Predefined Macros +@subsection Standard Predefined Macros +@cindex standard predefined macros. + +The standard predefined macros are specified by the relevant +language standards, so they are available with all compilers that +implement those standards. Older compilers may not provide all of +them. Their names all start with double underscores. + +@table @code +@item __FILE__ +This macro expands to the name of the current input file, in the form of +a C string constant. This is the path by which the preprocessor opened +the file, not the short name specified in @samp{#include} or as the +input file name argument. For example, +@code{"/usr/local/include/myheader.h"} is a possible expansion of this +macro. + +@item __LINE__ +This macro expands to the current input line number, in the form of a +decimal integer constant. While we call it a predefined macro, it's +a pretty strange macro, since its ``definition'' changes with each +new line of source code. +@end table + +@code{__FILE__} and @code{__LINE__} are useful in generating an error +message to report an inconsistency detected by the program; the message +can state the source line at which the inconsistency was detected. For +example, + +@smallexample +fprintf (stderr, "Internal error: " + "negative string length " + "%d at %s, line %d.", + length, __FILE__, __LINE__); +@end smallexample + +An @samp{#include} directive changes the expansions of @code{__FILE__} +and @code{__LINE__} to correspond to the included file. At the end of +that file, when processing resumes on the input file that contained +the @samp{#include} directive, the expansions of @code{__FILE__} and +@code{__LINE__} revert to the values they had before the +@samp{#include} (but @code{__LINE__} is then incremented by one as +processing moves to the line after the @samp{#include}). + +A @samp{#line} directive changes @code{__LINE__}, and may change +@code{__FILE__} as well. @xref{Line Control}. + +C99 introduced @code{__func__}, and GCC has provided @code{__FUNCTION__} +for a long time. Both of these are strings containing the name of the +current function (there are slight semantic differences; see the GCC +manual). Neither of them is a macro; the preprocessor does not know the +name of the current function. They tend to be useful in conjunction +with @code{__FILE__} and @code{__LINE__}, though. + +@table @code + +@item __DATE__ +This macro expands to a string constant that describes the date on which +the preprocessor is being run. The string constant contains eleven +characters and looks like @code{@w{"Feb 12 1996"}}. If the day of the +month is less than 10, it is padded with a space on the left. + +If GCC cannot determine the current date, it will emit a warning message +(once per compilation) and @code{__DATE__} will expand to +@code{@w{"??? ?? ????"}}. + +@item __TIME__ +This macro expands to a string constant that describes the time at +which the preprocessor is being run. The string constant contains +eight characters and looks like @code{"23:59:01"}. + +If GCC cannot determine the current time, it will emit a warning message +(once per compilation) and @code{__TIME__} will expand to +@code{"??:??:??"}. + +@item __STDC__ +In normal operation, this macro expands to the constant 1, to signify +that this compiler conforms to ISO Standard C@. If GNU CPP is used with +a compiler other than GCC, this is not necessarily true; however, the +preprocessor always conforms to the standard unless the +@option{-traditional-cpp} option is used. + +This macro is not defined if the @option{-traditional-cpp} option is used. + +On some hosts, the system compiler uses a different convention, where +@code{__STDC__} is normally 0, but is 1 if the user specifies strict +conformance to the C Standard. CPP follows the host convention when +processing system header files, but when processing user files +@code{__STDC__} is always 1. This has been reported to cause problems; +for instance, some versions of Solaris provide X Windows headers that +expect @code{__STDC__} to be either undefined or 1. @xref{Invocation}. + +@item __STDC_VERSION__ +This macro expands to the C Standard's version number, a long integer +constant of the form @code{@var{yyyy}@var{mm}L} where @var{yyyy} and +@var{mm} are the year and month of the Standard version. This signifies +which version of the C Standard the compiler conforms to. Like +@code{__STDC__}, this is not necessarily accurate for the entire +implementation, unless GNU CPP is being used with GCC@. + +The value @code{199409L} signifies the 1989 C standard as amended in +1994, which is the current default; the value @code{199901L} signifies +the 1999 revision of the C standard; the value @code{201112L} +signifies the 2011 revision of the C standard; the value +@code{201710L} signifies the 2017 revision of the C standard (which is +otherwise identical to the 2011 version apart from correction of +defects). An unspecified value larger than @code{201710L} is used for +the experimental @option{-std=c2x} and @option{-std=gnu2x} modes. + +This macro is not defined if the @option{-traditional-cpp} option is +used, nor when compiling C++ or Objective-C@. + +@item __STDC_HOSTED__ +This macro is defined, with value 1, if the compiler's target is a +@dfn{hosted environment}. A hosted environment has the complete +facilities of the standard C library available. + +@item __cplusplus +This macro is defined when the C++ compiler is in use. You can use +@code{__cplusplus} to test whether a header is compiled by a C compiler +or a C++ compiler. This macro is similar to @code{__STDC_VERSION__}, in +that it expands to a version number. Depending on the language standard +selected, the value of the macro is +@code{199711L} for the 1998 C++ standard, +@code{201103L} for the 2011 C++ standard, +@code{201402L} for the 2014 C++ standard, +@code{201703L} for the 2017 C++ standard, +@code{202002L} for the 2020 C++ standard, +or an unspecified value strictly larger than @code{202002L} for the +experimental languages enabled by @option{-std=c++23} and +@option{-std=gnu++23}. + +@item __OBJC__ +This macro is defined, with value 1, when the Objective-C compiler is in +use. You can use @code{__OBJC__} to test whether a header is compiled +by a C compiler or an Objective-C compiler. + +@item __ASSEMBLER__ +This macro is defined with value 1 when preprocessing assembly +language. + +@end table + +@node Common Predefined Macros +@subsection Common Predefined Macros +@cindex common predefined macros + +The common predefined macros are GNU C extensions. They are available +with the same meanings regardless of the machine or operating system on +which you are using GNU C or GNU Fortran. Their names all start with +double underscores. + +@table @code + +@item __COUNTER__ +This macro expands to sequential integral values starting from 0. In +conjunction with the @code{##} operator, this provides a convenient means to +generate unique identifiers. Care must be taken to ensure that +@code{__COUNTER__} is not expanded prior to inclusion of precompiled headers +which use it. Otherwise, the precompiled headers will not be used. + +@item __GFORTRAN__ +The GNU Fortran compiler defines this. + +@item __GNUC__ +@itemx __GNUC_MINOR__ +@itemx __GNUC_PATCHLEVEL__ +These macros are defined by all GNU compilers that use the C +preprocessor: C, C++, Objective-C and Fortran. Their values are the major +version, minor version, and patch level of the compiler, as integer +constants. For example, GCC version @var{x}.@var{y}.@var{z} +defines @code{__GNUC__} to @var{x}, @code{__GNUC_MINOR__} to @var{y}, +and @code{__GNUC_PATCHLEVEL__} to @var{z}. These +macros are also defined if you invoke the preprocessor directly. + +If all you need to know is whether or not your program is being compiled +by GCC, or a non-GCC compiler that claims to accept the GNU C dialects, +you can simply test @code{__GNUC__}. If you need to write code +which depends on a specific version, you must be more careful. Each +time the minor version is increased, the patch level is reset to zero; +each time the major version is increased, the +minor version and patch level are reset. If you wish to use the +predefined macros directly in the conditional, you will need to write it +like this: + +@smallexample +/* @r{Test for GCC > 3.2.0} */ +#if __GNUC__ > 3 || \ + (__GNUC__ == 3 && (__GNUC_MINOR__ > 2 || \ + (__GNUC_MINOR__ == 2 && \ + __GNUC_PATCHLEVEL__ > 0)) +@end smallexample + +@noindent +Another approach is to use the predefined macros to +calculate a single number, then compare that against a threshold: + +@smallexample +#define GCC_VERSION (__GNUC__ * 10000 \ + + __GNUC_MINOR__ * 100 \ + + __GNUC_PATCHLEVEL__) +@dots{} +/* @r{Test for GCC > 3.2.0} */ +#if GCC_VERSION > 30200 +@end smallexample + +@noindent +Many people find this form easier to understand. + +@item __GNUG__ +The GNU C++ compiler defines this. Testing it is equivalent to +testing @code{@w{(__GNUC__ && __cplusplus)}}. + +@item __STRICT_ANSI__ +GCC defines this macro if and only if the @option{-ansi} switch, or a +@option{-std} switch specifying strict conformance to some version of ISO C +or ISO C++, was specified when GCC was invoked. It is defined to @samp{1}. +This macro exists primarily to direct GNU libc's header files to use only +definitions found in standard C. + +@item __BASE_FILE__ +This macro expands to the name of the main input file, in the form +of a C string constant. This is the source file that was specified +on the command line of the preprocessor or C compiler. + +@item __FILE_NAME__ +This macro expands to the basename of the current input file, in the +form of a C string constant. This is the last path component by which +the preprocessor opened the file. For example, processing +@code{"/usr/local/include/myheader.h"} would set this +macro to @code{"myheader.h"}. + +@item __INCLUDE_LEVEL__ +This macro expands to a decimal integer constant that represents the +depth of nesting in include files. The value of this macro is +incremented on every @samp{#include} directive and decremented at the +end of every included file. It starts out at 0, its value within the +base file specified on the command line. + +@item __ELF__ +This macro is defined if the target uses the ELF object format. + +@item __VERSION__ +This macro expands to a string constant which describes the version of +the compiler in use. You should not rely on its contents having any +particular form, but it can be counted on to contain at least the +release number. + +@item __OPTIMIZE__ +@itemx __OPTIMIZE_SIZE__ +@itemx __NO_INLINE__ +These macros describe the compilation mode. @code{__OPTIMIZE__} is +defined in all optimizing compilations. @code{__OPTIMIZE_SIZE__} is +defined if the compiler is optimizing for size, not speed. +@code{__NO_INLINE__} is defined if no functions will be inlined into +their callers (when not optimizing, or when inlining has been +specifically disabled by @option{-fno-inline}). + +These macros cause certain GNU header files to provide optimized +definitions, using macros or inline functions, of system library +functions. You should not use these macros in any way unless you make +sure that programs will execute with the same effect whether or not they +are defined. If they are defined, their value is 1. + +@item __GNUC_GNU_INLINE__ +GCC defines this macro if functions declared @code{inline} will be +handled in GCC's traditional gnu90 mode. Object files will contain +externally visible definitions of all functions declared @code{inline} +without @code{extern} or @code{static}. They will not contain any +definitions of any functions declared @code{extern inline}. + +@item __GNUC_STDC_INLINE__ +GCC defines this macro if functions declared @code{inline} will be +handled according to the ISO C99 or later standards. Object files will contain +externally visible definitions of all functions declared @code{extern +inline}. They will not contain definitions of any functions declared +@code{inline} without @code{extern}. + +If this macro is defined, GCC supports the @code{gnu_inline} function +attribute as a way to always get the gnu90 behavior. + +@item __CHAR_UNSIGNED__ +GCC defines this macro if and only if the data type @code{char} is +unsigned on the target machine. It exists to cause the standard header +file @file{limits.h} to work correctly. You should not use this macro +yourself; instead, refer to the standard macros defined in @file{limits.h}. + +@item __WCHAR_UNSIGNED__ +Like @code{__CHAR_UNSIGNED__}, this macro is defined if and only if the +data type @code{wchar_t} is unsigned and the front-end is in C++ mode. + +@item __REGISTER_PREFIX__ +This macro expands to a single token (not a string constant) which is +the prefix applied to CPU register names in assembly language for this +target. You can use it to write assembly that is usable in multiple +environments. For example, in the @code{m68k-aout} environment it +expands to nothing, but in the @code{m68k-coff} environment it expands +to a single @samp{%}. + +@item __USER_LABEL_PREFIX__ +This macro expands to a single token which is the prefix applied to +user labels (symbols visible to C code) in assembly. For example, in +the @code{m68k-aout} environment it expands to an @samp{_}, but in the +@code{m68k-coff} environment it expands to nothing. + +This macro will have the correct definition even if +@option{-f(no-)underscores} is in use, but it will not be correct if +target-specific options that adjust this prefix are used (e.g.@: the +OSF/rose @option{-mno-underscores} option). + +@item __SIZE_TYPE__ +@itemx __PTRDIFF_TYPE__ +@itemx __WCHAR_TYPE__ +@itemx __WINT_TYPE__ +@itemx __INTMAX_TYPE__ +@itemx __UINTMAX_TYPE__ +@itemx __SIG_ATOMIC_TYPE__ +@itemx __INT8_TYPE__ +@itemx __INT16_TYPE__ +@itemx __INT32_TYPE__ +@itemx __INT64_TYPE__ +@itemx __UINT8_TYPE__ +@itemx __UINT16_TYPE__ +@itemx __UINT32_TYPE__ +@itemx __UINT64_TYPE__ +@itemx __INT_LEAST8_TYPE__ +@itemx __INT_LEAST16_TYPE__ +@itemx __INT_LEAST32_TYPE__ +@itemx __INT_LEAST64_TYPE__ +@itemx __UINT_LEAST8_TYPE__ +@itemx __UINT_LEAST16_TYPE__ +@itemx __UINT_LEAST32_TYPE__ +@itemx __UINT_LEAST64_TYPE__ +@itemx __INT_FAST8_TYPE__ +@itemx __INT_FAST16_TYPE__ +@itemx __INT_FAST32_TYPE__ +@itemx __INT_FAST64_TYPE__ +@itemx __UINT_FAST8_TYPE__ +@itemx __UINT_FAST16_TYPE__ +@itemx __UINT_FAST32_TYPE__ +@itemx __UINT_FAST64_TYPE__ +@itemx __INTPTR_TYPE__ +@itemx __UINTPTR_TYPE__ +These macros are defined to the correct underlying types for the +@code{size_t}, @code{ptrdiff_t}, @code{wchar_t}, @code{wint_t}, +@code{intmax_t}, @code{uintmax_t}, @code{sig_atomic_t}, @code{int8_t}, +@code{int16_t}, @code{int32_t}, @code{int64_t}, @code{uint8_t}, +@code{uint16_t}, @code{uint32_t}, @code{uint64_t}, +@code{int_least8_t}, @code{int_least16_t}, @code{int_least32_t}, +@code{int_least64_t}, @code{uint_least8_t}, @code{uint_least16_t}, +@code{uint_least32_t}, @code{uint_least64_t}, @code{int_fast8_t}, +@code{int_fast16_t}, @code{int_fast32_t}, @code{int_fast64_t}, +@code{uint_fast8_t}, @code{uint_fast16_t}, @code{uint_fast32_t}, +@code{uint_fast64_t}, @code{intptr_t}, and @code{uintptr_t} typedefs, +respectively. They exist to make the standard header files +@file{stddef.h}, @file{stdint.h}, and @file{wchar.h} work correctly. +You should not use these macros directly; instead, include the +appropriate headers and use the typedefs. Some of these macros may +not be defined on particular systems if GCC does not provide a +@file{stdint.h} header on those systems. + +@item __CHAR_BIT__ +Defined to the number of bits used in the representation of the +@code{char} data type. It exists to make the standard header given +numerical limits work correctly. You should not use +this macro directly; instead, include the appropriate headers. + +@item __SCHAR_MAX__ +@itemx __WCHAR_MAX__ +@itemx __SHRT_MAX__ +@itemx __INT_MAX__ +@itemx __LONG_MAX__ +@itemx __LONG_LONG_MAX__ +@itemx __WINT_MAX__ +@itemx __SIZE_MAX__ +@itemx __PTRDIFF_MAX__ +@itemx __INTMAX_MAX__ +@itemx __UINTMAX_MAX__ +@itemx __SIG_ATOMIC_MAX__ +@itemx __INT8_MAX__ +@itemx __INT16_MAX__ +@itemx __INT32_MAX__ +@itemx __INT64_MAX__ +@itemx __UINT8_MAX__ +@itemx __UINT16_MAX__ +@itemx __UINT32_MAX__ +@itemx __UINT64_MAX__ +@itemx __INT_LEAST8_MAX__ +@itemx __INT_LEAST16_MAX__ +@itemx __INT_LEAST32_MAX__ +@itemx __INT_LEAST64_MAX__ +@itemx __UINT_LEAST8_MAX__ +@itemx __UINT_LEAST16_MAX__ +@itemx __UINT_LEAST32_MAX__ +@itemx __UINT_LEAST64_MAX__ +@itemx __INT_FAST8_MAX__ +@itemx __INT_FAST16_MAX__ +@itemx __INT_FAST32_MAX__ +@itemx __INT_FAST64_MAX__ +@itemx __UINT_FAST8_MAX__ +@itemx __UINT_FAST16_MAX__ +@itemx __UINT_FAST32_MAX__ +@itemx __UINT_FAST64_MAX__ +@itemx __INTPTR_MAX__ +@itemx __UINTPTR_MAX__ +@itemx __WCHAR_MIN__ +@itemx __WINT_MIN__ +@itemx __SIG_ATOMIC_MIN__ +Defined to the maximum value of the @code{signed char}, @code{wchar_t}, +@code{signed short}, +@code{signed int}, @code{signed long}, @code{signed long long}, +@code{wint_t}, @code{size_t}, @code{ptrdiff_t}, +@code{intmax_t}, @code{uintmax_t}, @code{sig_atomic_t}, @code{int8_t}, +@code{int16_t}, @code{int32_t}, @code{int64_t}, @code{uint8_t}, +@code{uint16_t}, @code{uint32_t}, @code{uint64_t}, +@code{int_least8_t}, @code{int_least16_t}, @code{int_least32_t}, +@code{int_least64_t}, @code{uint_least8_t}, @code{uint_least16_t}, +@code{uint_least32_t}, @code{uint_least64_t}, @code{int_fast8_t}, +@code{int_fast16_t}, @code{int_fast32_t}, @code{int_fast64_t}, +@code{uint_fast8_t}, @code{uint_fast16_t}, @code{uint_fast32_t}, +@code{uint_fast64_t}, @code{intptr_t}, and @code{uintptr_t} types and +to the minimum value of the @code{wchar_t}, @code{wint_t}, and +@code{sig_atomic_t} types respectively. They exist to make the +standard header given numerical limits work correctly. You should not +use these macros directly; instead, include the appropriate headers. +Some of these macros may not be defined on particular systems if GCC +does not provide a @file{stdint.h} header on those systems. + +@item __INT8_C +@itemx __INT16_C +@itemx __INT32_C +@itemx __INT64_C +@itemx __UINT8_C +@itemx __UINT16_C +@itemx __UINT32_C +@itemx __UINT64_C +@itemx __INTMAX_C +@itemx __UINTMAX_C +Defined to implementations of the standard @file{stdint.h} macros with +the same names without the leading @code{__}. They exist the make the +implementation of that header work correctly. You should not use +these macros directly; instead, include the appropriate headers. Some +of these macros may not be defined on particular systems if GCC does +not provide a @file{stdint.h} header on those systems. + +@item __SCHAR_WIDTH__ +@itemx __SHRT_WIDTH__ +@itemx __INT_WIDTH__ +@itemx __LONG_WIDTH__ +@itemx __LONG_LONG_WIDTH__ +@itemx __PTRDIFF_WIDTH__ +@itemx __SIG_ATOMIC_WIDTH__ +@itemx __SIZE_WIDTH__ +@itemx __WCHAR_WIDTH__ +@itemx __WINT_WIDTH__ +@itemx __INT_LEAST8_WIDTH__ +@itemx __INT_LEAST16_WIDTH__ +@itemx __INT_LEAST32_WIDTH__ +@itemx __INT_LEAST64_WIDTH__ +@itemx __INT_FAST8_WIDTH__ +@itemx __INT_FAST16_WIDTH__ +@itemx __INT_FAST32_WIDTH__ +@itemx __INT_FAST64_WIDTH__ +@itemx __INTPTR_WIDTH__ +@itemx __INTMAX_WIDTH__ +Defined to the bit widths of the corresponding types. They exist to +make the implementations of @file{limits.h} and @file{stdint.h} behave +correctly. You should not use these macros directly; instead, include +the appropriate headers. Some of these macros may not be defined on +particular systems if GCC does not provide a @file{stdint.h} header on +those systems. + +@item __SIZEOF_INT__ +@itemx __SIZEOF_LONG__ +@itemx __SIZEOF_LONG_LONG__ +@itemx __SIZEOF_SHORT__ +@itemx __SIZEOF_POINTER__ +@itemx __SIZEOF_FLOAT__ +@itemx __SIZEOF_DOUBLE__ +@itemx __SIZEOF_LONG_DOUBLE__ +@itemx __SIZEOF_SIZE_T__ +@itemx __SIZEOF_WCHAR_T__ +@itemx __SIZEOF_WINT_T__ +@itemx __SIZEOF_PTRDIFF_T__ +Defined to the number of bytes of the C standard data types: @code{int}, +@code{long}, @code{long long}, @code{short}, @code{void *}, @code{float}, +@code{double}, @code{long double}, @code{size_t}, @code{wchar_t}, @code{wint_t} +and @code{ptrdiff_t}. + +@item __BYTE_ORDER__ +@itemx __ORDER_LITTLE_ENDIAN__ +@itemx __ORDER_BIG_ENDIAN__ +@itemx __ORDER_PDP_ENDIAN__ +@code{__BYTE_ORDER__} is defined to one of the values +@code{__ORDER_LITTLE_ENDIAN__}, @code{__ORDER_BIG_ENDIAN__}, or +@code{__ORDER_PDP_ENDIAN__} to reflect the layout of multi-byte and +multi-word quantities in memory. If @code{__BYTE_ORDER__} is equal to +@code{__ORDER_LITTLE_ENDIAN__} or @code{__ORDER_BIG_ENDIAN__}, then +multi-byte and multi-word quantities are laid out identically: the +byte (word) at the lowest address is the least significant or most +significant byte (word) of the quantity, respectively. If +@code{__BYTE_ORDER__} is equal to @code{__ORDER_PDP_ENDIAN__}, then +bytes in 16-bit words are laid out in a little-endian fashion, whereas +the 16-bit subwords of a 32-bit quantity are laid out in big-endian +fashion. + +You should use these macros for testing like this: + +@smallexample +/* @r{Test for a little-endian machine} */ +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ +@end smallexample + +@item __FLOAT_WORD_ORDER__ +@code{__FLOAT_WORD_ORDER__} is defined to one of the values +@code{__ORDER_LITTLE_ENDIAN__} or @code{__ORDER_BIG_ENDIAN__} to reflect +the layout of the words of multi-word floating-point quantities. + +@item __DEPRECATED +This macro is defined, with value 1, when compiling a C++ source file +with warnings about deprecated constructs enabled. These warnings are +enabled by default, but can be disabled with @option{-Wno-deprecated}. + +@item __EXCEPTIONS +This macro is defined, with value 1, when compiling a C++ source file +with exceptions enabled. If @option{-fno-exceptions} is used when +compiling the file, then this macro is not defined. + +@item __GXX_RTTI +This macro is defined, with value 1, when compiling a C++ source file +with runtime type identification enabled. If @option{-fno-rtti} is +used when compiling the file, then this macro is not defined. + +@item __USING_SJLJ_EXCEPTIONS__ +This macro is defined, with value 1, if the compiler uses the old +mechanism based on @code{setjmp} and @code{longjmp} for exception +handling. + +@item __GXX_EXPERIMENTAL_CXX0X__ +This macro is defined when compiling a C++ source file with C++11 features +enabled, i.e., for all C++ language dialects except @option{-std=c++98} +and @option{-std=gnu++98}. This macro is obsolete, but can be used to +detect experimental C++0x features in very old versions of GCC. Since +GCC 4.7.0 the @code{__cplusplus} macro is defined correctly, so most +code should test @code{__cplusplus >= 201103L} instead of using this +macro. + +@item __GXX_WEAK__ +This macro is defined when compiling a C++ source file. It has the +value 1 if the compiler will use weak symbols, COMDAT sections, or +other similar techniques to collapse symbols with ``vague linkage'' +that are defined in multiple translation units. If the compiler will +not collapse such symbols, this macro is defined with value 0. In +general, user code should not need to make use of this macro; the +purpose of this macro is to ease implementation of the C++ runtime +library provided with G++. + +@item __NEXT_RUNTIME__ +This macro is defined, with value 1, if (and only if) the NeXT runtime +(as in @option{-fnext-runtime}) is in use for Objective-C@. If the GNU +runtime is used, this macro is not defined, so that you can use this +macro to determine which runtime (NeXT or GNU) is being used. + +@item __LP64__ +@itemx _LP64 +These macros are defined, with value 1, if (and only if) the compilation +is for a target where @code{long int} and pointer both use 64-bits and +@code{int} uses 32-bit. + +@item __SSP__ +This macro is defined, with value 1, when @option{-fstack-protector} is in +use. + +@item __SSP_ALL__ +This macro is defined, with value 2, when @option{-fstack-protector-all} is +in use. + +@item __SSP_STRONG__ +This macro is defined, with value 3, when @option{-fstack-protector-strong} is +in use. + +@item __SSP_EXPLICIT__ +This macro is defined, with value 4, when @option{-fstack-protector-explicit} is +in use. + +@item __SANITIZE_ADDRESS__ +This macro is defined, with value 1, when @option{-fsanitize=address} +or @option{-fsanitize=kernel-address} are in use. + +@item __SANITIZE_THREAD__ +This macro is defined, with value 1, when @option{-fsanitize=thread} is in use. + +@item __TIMESTAMP__ +This macro expands to a string constant that describes the date and time +of the last modification of the current source file. The string constant +contains abbreviated day of the week, month, day of the month, time in +hh:mm:ss form, year and looks like @code{@w{"Sun Sep 16 01:03:52 1973"}}. +If the day of the month is less than 10, it is padded with a space on the left. + +If GCC cannot determine the current date, it will emit a warning message +(once per compilation) and @code{__TIMESTAMP__} will expand to +@code{@w{"??? ??? ?? ??:??:?? ????"}}. + +@item __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 +@itemx __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 +@itemx __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 +@itemx __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 +@itemx __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 +These macros are defined when the target processor supports atomic compare +and swap operations on operands 1, 2, 4, 8 or 16 bytes in length, respectively. + +@item __HAVE_SPECULATION_SAFE_VALUE +This macro is defined with the value 1 to show that this version of GCC +supports @code{__builtin_speculation_safe_value}. + +@item __GCC_HAVE_DWARF2_CFI_ASM +This macro is defined when the compiler is emitting DWARF CFI directives +to the assembler. When this is defined, it is possible to emit those same +directives in inline assembly. + +@item __FP_FAST_FMA +@itemx __FP_FAST_FMAF +@itemx __FP_FAST_FMAL +These macros are defined with value 1 if the backend supports the +@code{fma}, @code{fmaf}, and @code{fmal} builtin functions, so that +the include file @file{math.h} can define the macros +@code{FP_FAST_FMA}, @code{FP_FAST_FMAF}, and @code{FP_FAST_FMAL} +for compatibility with the 1999 C standard. + +@item __FP_FAST_FMAF16 +@itemx __FP_FAST_FMAF32 +@itemx __FP_FAST_FMAF64 +@itemx __FP_FAST_FMAF128 +@itemx __FP_FAST_FMAF32X +@itemx __FP_FAST_FMAF64X +@itemx __FP_FAST_FMAF128X +These macros are defined with the value 1 if the backend supports the +@code{fma} functions using the additional @code{_Float@var{n}} and +@code{_Float@var{n}x} types that are defined in ISO/IEC TS +18661-3:2015. The include file @file{math.h} can define the +@code{FP_FAST_FMAF@var{n}} and @code{FP_FAST_FMAF@var{n}x} macros if +the user defined @code{__STDC_WANT_IEC_60559_TYPES_EXT__} before +including @file{math.h}. + +@item __GCC_IEC_559 +This macro is defined to indicate the intended level of support for +IEEE 754 (IEC 60559) floating-point arithmetic. It expands to a +nonnegative integer value. If 0, it indicates that the combination of +the compiler configuration and the command-line options is not +intended to support IEEE 754 arithmetic for @code{float} and +@code{double} as defined in C99 and C11 Annex F (for example, that the +standard rounding modes and exceptions are not supported, or that +optimizations are enabled that conflict with IEEE 754 semantics). If +1, it indicates that IEEE 754 arithmetic is intended to be supported; +this does not mean that all relevant language features are supported +by GCC. If 2 or more, it additionally indicates support for IEEE +754-2008 (in particular, that the binary encodings for quiet and +signaling NaNs are as specified in IEEE 754-2008). + +This macro does not indicate the default state of command-line options +that control optimizations that C99 and C11 permit to be controlled by +standard pragmas, where those standards do not require a particular +default state. It does not indicate whether optimizations respect +signaling NaN semantics (the macro for that is +@code{__SUPPORT_SNAN__}). It does not indicate support for decimal +floating point or the IEEE 754 binary16 and binary128 types. + +@item __GCC_IEC_559_COMPLEX +This macro is defined to indicate the intended level of support for +IEEE 754 (IEC 60559) floating-point arithmetic for complex numbers, as +defined in C99 and C11 Annex G. It expands to a nonnegative integer +value. If 0, it indicates that the combination of the compiler +configuration and the command-line options is not intended to support +Annex G requirements (for example, because @option{-fcx-limited-range} +was used). If 1 or more, it indicates that it is intended to support +those requirements; this does not mean that all relevant language +features are supported by GCC. + +@item __NO_MATH_ERRNO__ +This macro is defined if @option{-fno-math-errno} is used, or enabled +by another option such as @option{-ffast-math} or by default. + +@item __RECIPROCAL_MATH__ +This macro is defined if @option{-freciprocal-math} is used, or enabled +by another option such as @option{-ffast-math} or by default. + +@item __NO_SIGNED_ZEROS__ +This macro is defined if @option{-fno-signed-zeros} is used, or enabled +by another option such as @option{-ffast-math} or by default. + +@item __NO_TRAPPING_MATH__ +This macro is defined if @option{-fno-trapping-math} is used. + +@item __ASSOCIATIVE_MATH__ +This macro is defined if @option{-fassociative-math} is used, or enabled +by another option such as @option{-ffast-math} or by default. + +@item __ROUNDING_MATH__ +This macro is defined if @option{-frounding-math} is used. + +@item __GNUC_EXECUTION_CHARSET_NAME +@itemx __GNUC_WIDE_EXECUTION_CHARSET_NAME +These macros are defined to expand to a narrow string literal of +the name of the narrow and wide compile-time execution character +set used. It directly reflects the name passed to the options +@option{-fexec-charset} and @option{-fwide-exec-charset}, or the defaults +documented for those options (that is, it can expand to something like +@code{"UTF-8"}). @xref{Invocation}. +@end table + +@node System-specific Predefined Macros +@subsection System-specific Predefined Macros + +@cindex system-specific predefined macros +@cindex predefined macros, system-specific +@cindex reserved namespace + +The C preprocessor normally predefines several macros that indicate what +type of system and machine is in use. They are obviously different on +each target supported by GCC@. This manual, being for all systems and +machines, cannot tell you what their names are, but you can use +@command{cpp -dM} to see them all. @xref{Invocation}. All system-specific +predefined macros expand to a constant value, so you can test them with +either @samp{#ifdef} or @samp{#if}. + +The C standard requires that all system-specific macros be part of the +@dfn{reserved namespace}. All names which begin with two underscores, +or an underscore and a capital letter, are reserved for the compiler and +library to use as they wish. However, historically system-specific +macros have had names with no special prefix; for instance, it is common +to find @code{unix} defined on Unix systems. For all such macros, GCC +provides a parallel macro with two underscores added at the beginning +and the end. If @code{unix} is defined, @code{__unix__} will be defined +too. There will never be more than two underscores; the parallel of +@code{_mips} is @code{__mips__}. + +When the @option{-ansi} option, or any @option{-std} option that +requests strict conformance, is given to the compiler, all the +system-specific predefined macros outside the reserved namespace are +suppressed. The parallel macros, inside the reserved namespace, remain +defined. + +We are slowly phasing out all predefined macros which are outside the +reserved namespace. You should never use them in new programs, and we +encourage you to correct older code to use the parallel macros whenever +you find it. We don't recommend you use the system-specific macros that +are in the reserved namespace, either. It is better in the long run to +check specifically for features you need, using a tool such as +@command{autoconf}. + +@node C++ Named Operators +@subsection C++ Named Operators +@cindex named operators +@cindex C++ named operators +@cindex @file{iso646.h} + +In C++, there are eleven keywords which are simply alternate spellings +of operators normally written with punctuation. These keywords are +treated as such even in the preprocessor. They function as operators in +@samp{#if}, and they cannot be defined as macros or poisoned. In C, you +can request that those keywords take their C++ meaning by including +@file{iso646.h}. That header defines each one as a normal object-like +macro expanding to the appropriate punctuator. + +These are the named operators and their corresponding punctuators: + +@multitable {Named Operator} {Punctuator} +@item Named Operator @tab Punctuator +@item @code{and} @tab @code{&&} +@item @code{and_eq} @tab @code{&=} +@item @code{bitand} @tab @code{&} +@item @code{bitor} @tab @code{|} +@item @code{compl} @tab @code{~} +@item @code{not} @tab @code{!} +@item @code{not_eq} @tab @code{!=} +@item @code{or} @tab @code{||} +@item @code{or_eq} @tab @code{|=} +@item @code{xor} @tab @code{^} +@item @code{xor_eq} @tab @code{^=} +@end multitable + +@node Undefining and Redefining Macros +@section Undefining and Redefining Macros +@cindex undefining macros +@cindex redefining macros +@findex #undef + +If a macro ceases to be useful, it may be @dfn{undefined} with the +@samp{#undef} directive. @samp{#undef} takes a single argument, the +name of the macro to undefine. You use the bare macro name, even if the +macro is function-like. It is an error if anything appears on the line +after the macro name. @samp{#undef} has no effect if the name is not a +macro. + +@smallexample +#define FOO 4 +x = FOO; @expansion{} x = 4; +#undef FOO +x = FOO; @expansion{} x = FOO; +@end smallexample + +Once a macro has been undefined, that identifier may be @dfn{redefined} +as a macro by a subsequent @samp{#define} directive. The new definition +need not have any resemblance to the old definition. + +However, if an identifier which is currently a macro is redefined, then +the new definition must be @dfn{effectively the same} as the old one. +Two macro definitions are effectively the same if: +@itemize @bullet +@item Both are the same type of macro (object- or function-like). +@item All the tokens of the replacement list are the same. +@item If there are any parameters, they are the same. +@item Whitespace appears in the same places in both. It need not be +exactly the same amount of whitespace, though. Remember that comments +count as whitespace. +@end itemize + +@noindent +These definitions are effectively the same: +@smallexample +#define FOUR (2 + 2) +#define FOUR (2 + 2) +#define FOUR (2 /* @r{two} */ + 2) +@end smallexample +@noindent +but these are not: +@smallexample +#define FOUR (2 + 2) +#define FOUR ( 2+2 ) +#define FOUR (2 * 2) +#define FOUR(score,and,seven,years,ago) (2 + 2) +@end smallexample + +If a macro is redefined with a definition that is not effectively the +same as the old one, the preprocessor issues a warning and changes the +macro to use the new definition. If the new definition is effectively +the same, the redefinition is silently ignored. This allows, for +instance, two different headers to define a common macro. The +preprocessor will only complain if the definitions do not match. + +@node Directives Within Macro Arguments +@section Directives Within Macro Arguments +@cindex macro arguments and directives + +Occasionally it is convenient to use preprocessor directives within +the arguments of a macro. The C and C++ standards declare that +behavior in these cases is undefined. GNU CPP +processes arbitrary directives within macro arguments in +exactly the same way as it would have processed the directive were the +function-like macro invocation not present. + +If, within a macro invocation, that macro is redefined, then the new +definition takes effect in time for argument pre-expansion, but the +original definition is still used for argument replacement. Here is a +pathological example: + +@smallexample +#define f(x) x x +f (1 +#undef f +#define f 2 +f) +@end smallexample + +@noindent +which expands to + +@smallexample +1 2 1 2 +@end smallexample + +@noindent +with the semantics described above. + +@node Macro Pitfalls +@section Macro Pitfalls +@cindex problems with macros +@cindex pitfalls of macros + +In this section we describe some special rules that apply to macros and +macro expansion, and point out certain cases in which the rules have +counter-intuitive consequences that you must watch out for. + +@menu +* Misnesting:: +* Operator Precedence Problems:: +* Swallowing the Semicolon:: +* Duplication of Side Effects:: +* Self-Referential Macros:: +* Argument Prescan:: +* Newlines in Arguments:: +@end menu + +@node Misnesting +@subsection Misnesting + +When a macro is called with arguments, the arguments are substituted +into the macro body and the result is checked, together with the rest of +the input file, for more macro calls. It is possible to piece together +a macro call coming partially from the macro body and partially from the +arguments. For example, + +@smallexample +#define twice(x) (2*(x)) +#define call_with_1(x) x(1) +call_with_1 (twice) + @expansion{} twice(1) + @expansion{} (2*(1)) +@end smallexample + +Macro definitions do not have to have balanced parentheses. By writing +an unbalanced open parenthesis in a macro body, it is possible to create +a macro call that begins inside the macro body but ends outside of it. +For example, + +@smallexample +#define strange(file) fprintf (file, "%s %d", +@dots{} +strange(stderr) p, 35) + @expansion{} fprintf (stderr, "%s %d", p, 35) +@end smallexample + +The ability to piece together a macro call can be useful, but the use of +unbalanced open parentheses in a macro body is just confusing, and +should be avoided. + +@node Operator Precedence Problems +@subsection Operator Precedence Problems +@cindex parentheses in macro bodies + +You may have noticed that in most of the macro definition examples shown +above, each occurrence of a macro argument name had parentheses around +it. In addition, another pair of parentheses usually surround the +entire macro definition. Here is why it is best to write macros that +way. + +Suppose you define a macro as follows, + +@smallexample +#define ceil_div(x, y) (x + y - 1) / y +@end smallexample + +@noindent +whose purpose is to divide, rounding up. (One use for this operation is +to compute how many @code{int} objects are needed to hold a certain +number of @code{char} objects.) Then suppose it is used as follows: + +@smallexample +a = ceil_div (b & c, sizeof (int)); + @expansion{} a = (b & c + sizeof (int) - 1) / sizeof (int); +@end smallexample + +@noindent +This does not do what is intended. The operator-precedence rules of +C make it equivalent to this: + +@smallexample +a = (b & (c + sizeof (int) - 1)) / sizeof (int); +@end smallexample + +@noindent +What we want is this: + +@smallexample +a = ((b & c) + sizeof (int) - 1)) / sizeof (int); +@end smallexample + +@noindent +Defining the macro as + +@smallexample +#define ceil_div(x, y) ((x) + (y) - 1) / (y) +@end smallexample + +@noindent +provides the desired result. + +Unintended grouping can result in another way. Consider @code{sizeof +ceil_div(1, 2)}. That has the appearance of a C expression that would +compute the size of the type of @code{ceil_div (1, 2)}, but in fact it +means something very different. Here is what it expands to: + +@smallexample +sizeof ((1) + (2) - 1) / (2) +@end smallexample + +@noindent +This would take the size of an integer and divide it by two. The +precedence rules have put the division outside the @code{sizeof} when it +was intended to be inside. + +Parentheses around the entire macro definition prevent such problems. +Here, then, is the recommended way to define @code{ceil_div}: + +@smallexample +#define ceil_div(x, y) (((x) + (y) - 1) / (y)) +@end smallexample + +@node Swallowing the Semicolon +@subsection Swallowing the Semicolon +@cindex semicolons (after macro calls) + +Often it is desirable to define a macro that expands into a compound +statement. Consider, for example, the following macro, that advances a +pointer (the argument @code{p} says where to find it) across whitespace +characters: + +@smallexample +#define SKIP_SPACES(p, limit) \ +@{ char *lim = (limit); \ + while (p < lim) @{ \ + if (*p++ != ' ') @{ \ + p--; break; @}@}@} +@end smallexample + +@noindent +Here backslash-newline is used to split the macro definition, which must +be a single logical line, so that it resembles the way such code would +be laid out if not part of a macro definition. + +A call to this macro might be @code{SKIP_SPACES (p, lim)}. Strictly +speaking, the call expands to a compound statement, which is a complete +statement with no need for a semicolon to end it. However, since it +looks like a function call, it minimizes confusion if you can use it +like a function call, writing a semicolon afterward, as in +@code{SKIP_SPACES (p, lim);} + +This can cause trouble before @code{else} statements, because the +semicolon is actually a null statement. Suppose you write + +@smallexample +if (*p != 0) + SKIP_SPACES (p, lim); +else @dots{} +@end smallexample + +@noindent +The presence of two statements---the compound statement and a null +statement---in between the @code{if} condition and the @code{else} +makes invalid C code. + +The definition of the macro @code{SKIP_SPACES} can be altered to solve +this problem, using a @code{do @dots{} while} statement. Here is how: + +@smallexample +#define SKIP_SPACES(p, limit) \ +do @{ char *lim = (limit); \ + while (p < lim) @{ \ + if (*p++ != ' ') @{ \ + p--; break; @}@}@} \ +while (0) +@end smallexample + +Now @code{SKIP_SPACES (p, lim);} expands into + +@smallexample +do @{@dots{}@} while (0); +@end smallexample + +@noindent +which is one statement. The loop executes exactly once; most compilers +generate no extra code for it. + +@node Duplication of Side Effects +@subsection Duplication of Side Effects + +@cindex side effects (in macro arguments) +@cindex unsafe macros +Many C programs define a macro @code{min}, for ``minimum'', like this: + +@smallexample +#define min(X, Y) ((X) < (Y) ? (X) : (Y)) +@end smallexample + +When you use this macro with an argument containing a side effect, +as shown here, + +@smallexample +next = min (x + y, foo (z)); +@end smallexample + +@noindent +it expands as follows: + +@smallexample +next = ((x + y) < (foo (z)) ? (x + y) : (foo (z))); +@end smallexample + +@noindent +where @code{x + y} has been substituted for @code{X} and @code{foo (z)} +for @code{Y}. + +The function @code{foo} is used only once in the statement as it appears +in the program, but the expression @code{foo (z)} has been substituted +twice into the macro expansion. As a result, @code{foo} might be called +two times when the statement is executed. If it has side effects or if +it takes a long time to compute, the results might not be what you +intended. We say that @code{min} is an @dfn{unsafe} macro. + +The best solution to this problem is to define @code{min} in a way that +computes the value of @code{foo (z)} only once. The C language offers +no standard way to do this, but it can be done with GNU extensions as +follows: + +@smallexample +#define min(X, Y) \ +(@{ typeof (X) x_ = (X); \ + typeof (Y) y_ = (Y); \ + (x_ < y_) ? x_ : y_; @}) +@end smallexample + +The @samp{(@{ @dots{} @})} notation produces a compound statement that +acts as an expression. Its value is the value of its last statement. +This permits us to define local variables and assign each argument to +one. The local variables have underscores after their names to reduce +the risk of conflict with an identifier of wider scope (it is impossible +to avoid this entirely). Now each argument is evaluated exactly once. + +If you do not wish to use GNU C extensions, the only solution is to be +careful when @emph{using} the macro @code{min}. For example, you can +calculate the value of @code{foo (z)}, save it in a variable, and use +that variable in @code{min}: + +@smallexample +@group +#define min(X, Y) ((X) < (Y) ? (X) : (Y)) +@dots{} +@{ + int tem = foo (z); + next = min (x + y, tem); +@} +@end group +@end smallexample + +@noindent +(where we assume that @code{foo} returns type @code{int}). + +@node Self-Referential Macros +@subsection Self-Referential Macros +@cindex self-reference + +A @dfn{self-referential} macro is one whose name appears in its +definition. Recall that all macro definitions are rescanned for more +macros to replace. If the self-reference were considered a use of the +macro, it would produce an infinitely large expansion. To prevent this, +the self-reference is not considered a macro call. It is passed into +the preprocessor output unchanged. Consider an example: + +@smallexample +#define foo (4 + foo) +@end smallexample + +@noindent +where @code{foo} is also a variable in your program. + +Following the ordinary rules, each reference to @code{foo} will expand +into @code{(4 + foo)}; then this will be rescanned and will expand into +@code{(4 + (4 + foo))}; and so on until the computer runs out of memory. + +The self-reference rule cuts this process short after one step, at +@code{(4 + foo)}. Therefore, this macro definition has the possibly +useful effect of causing the program to add 4 to the value of @code{foo} +wherever @code{foo} is referred to. + +In most cases, it is a bad idea to take advantage of this feature. A +person reading the program who sees that @code{foo} is a variable will +not expect that it is a macro as well. The reader will come across the +identifier @code{foo} in the program and think its value should be that +of the variable @code{foo}, whereas in fact the value is four greater. + +One common, useful use of self-reference is to create a macro which +expands to itself. If you write + +@smallexample +#define EPERM EPERM +@end smallexample + +@noindent +then the macro @code{EPERM} expands to @code{EPERM}. Effectively, it is +left alone by the preprocessor whenever it's used in running text. You +can tell that it's a macro with @samp{#ifdef}. You might do this if you +want to define numeric constants with an @code{enum}, but have +@samp{#ifdef} be true for each constant. + +If a macro @code{x} expands to use a macro @code{y}, and the expansion of +@code{y} refers to the macro @code{x}, that is an @dfn{indirect +self-reference} of @code{x}. @code{x} is not expanded in this case +either. Thus, if we have + +@smallexample +#define x (4 + y) +#define y (2 * x) +@end smallexample + +@noindent +then @code{x} and @code{y} expand as follows: + +@smallexample +@group +x @expansion{} (4 + y) + @expansion{} (4 + (2 * x)) + +y @expansion{} (2 * x) + @expansion{} (2 * (4 + y)) +@end group +@end smallexample + +@noindent +Each macro is expanded when it appears in the definition of the other +macro, but not when it indirectly appears in its own definition. + +@node Argument Prescan +@subsection Argument Prescan +@cindex expansion of arguments +@cindex macro argument expansion +@cindex prescan of macro arguments + +Macro arguments are completely macro-expanded before they are +substituted into a macro body, unless they are stringized or pasted +with other tokens. After substitution, the entire macro body, including +the substituted arguments, is scanned again for macros to be expanded. +The result is that the arguments are scanned @emph{twice} to expand +macro calls in them. + +Most of the time, this has no effect. If the argument contained any +macro calls, they are expanded during the first scan. The result +therefore contains no macro calls, so the second scan does not change +it. If the argument were substituted as given, with no prescan, the +single remaining scan would find the same macro calls and produce the +same results. + +You might expect the double scan to change the results when a +self-referential macro is used in an argument of another macro +(@pxref{Self-Referential Macros}): the self-referential macro would be +expanded once in the first scan, and a second time in the second scan. +However, this is not what happens. The self-references that do not +expand in the first scan are marked so that they will not expand in the +second scan either. + +You might wonder, ``Why mention the prescan, if it makes no difference? +And why not skip it and make the preprocessor faster?'' The answer is +that the prescan does make a difference in three special cases: + +@itemize @bullet +@item +Nested calls to a macro. + +We say that @dfn{nested} calls to a macro occur when a macro's argument +contains a call to that very macro. For example, if @code{f} is a macro +that expects one argument, @code{f (f (1))} is a nested pair of calls to +@code{f}. The desired expansion is made by expanding @code{f (1)} and +substituting that into the definition of @code{f}. The prescan causes +the expected result to happen. Without the prescan, @code{f (1)} itself +would be substituted as an argument, and the inner use of @code{f} would +appear during the main scan as an indirect self-reference and would not +be expanded. + +@item +Macros that call other macros that stringize or concatenate. + +If an argument is stringized or concatenated, the prescan does not +occur. If you @emph{want} to expand a macro, then stringize or +concatenate its expansion, you can do that by causing one macro to call +another macro that does the stringizing or concatenation. For +instance, if you have + +@smallexample +#define AFTERX(x) X_ ## x +#define XAFTERX(x) AFTERX(x) +#define TABLESIZE 1024 +#define BUFSIZE TABLESIZE +@end smallexample + +then @code{AFTERX(BUFSIZE)} expands to @code{X_BUFSIZE}, and +@code{XAFTERX(BUFSIZE)} expands to @code{X_1024}. (Not to +@code{X_TABLESIZE}. Prescan always does a complete expansion.) + +@item +Macros used in arguments, whose expansions contain unshielded commas. + +This can cause a macro expanded on the second scan to be called with the +wrong number of arguments. Here is an example: + +@smallexample +#define foo a,b +#define bar(x) lose(x) +#define lose(x) (1 + (x)) +@end smallexample + +We would like @code{bar(foo)} to turn into @code{(1 + (foo))}, which +would then turn into @code{(1 + (a,b))}. Instead, @code{bar(foo)} +expands into @code{lose(a,b)}, and you get an error because @code{lose} +requires a single argument. In this case, the problem is easily solved +by the same parentheses that ought to be used to prevent misnesting of +arithmetic operations: + +@smallexample +#define foo (a,b) +@exdent or +#define bar(x) lose((x)) +@end smallexample + +The extra pair of parentheses prevents the comma in @code{foo}'s +definition from being interpreted as an argument separator. + +@end itemize + +@node Newlines in Arguments +@subsection Newlines in Arguments +@cindex newlines in macro arguments + +The invocation of a function-like macro can extend over many logical +lines. However, in the present implementation, the entire expansion +comes out on one line. Thus line numbers emitted by the compiler or +debugger refer to the line the invocation started on, which might be +different to the line containing the argument causing the problem. + +Here is an example illustrating this: + +@smallexample +#define ignore_second_arg(a,b,c) a; c + +ignore_second_arg (foo (), + ignored (), + syntax error); +@end smallexample + +@noindent +The syntax error triggered by the tokens @code{syntax error} results in +an error message citing line three---the line of ignore_second_arg--- +even though the problematic code comes from line five. + +We consider this a bug, and intend to fix it in the near future. + +@node Conditionals +@chapter Conditionals +@cindex conditionals + +A @dfn{conditional} is a directive that instructs the preprocessor to +select whether or not to include a chunk of code in the final token +stream passed to the compiler. Preprocessor conditionals can test +arithmetic expressions, or whether a name is defined as a macro, or both +simultaneously using the special @code{defined} operator. + +A conditional in the C preprocessor resembles in some ways an @code{if} +statement in C, but it is important to understand the difference between +them. The condition in an @code{if} statement is tested during the +execution of your program. Its purpose is to allow your program to +behave differently from run to run, depending on the data it is +operating on. The condition in a preprocessing conditional directive is +tested when your program is compiled. Its purpose is to allow different +code to be included in the program depending on the situation at the +time of compilation. + +However, the distinction is becoming less clear. Modern compilers often +do test @code{if} statements when a program is compiled, if their +conditions are known not to vary at run time, and eliminate code which +can never be executed. If you can count on your compiler to do this, +you may find that your program is more readable if you use @code{if} +statements with constant conditions (perhaps determined by macros). Of +course, you can only use this to exclude code, not type definitions or +other preprocessing directives, and you can only do it if the code +remains syntactically valid when it is not to be used. + +@menu +* Conditional Uses:: +* Conditional Syntax:: +* Deleted Code:: +@end menu + +@node Conditional Uses +@section Conditional Uses + +There are three general reasons to use a conditional. + +@itemize @bullet +@item +A program may need to use different code depending on the machine or +operating system it is to run on. In some cases the code for one +operating system may be erroneous on another operating system; for +example, it might refer to data types or constants that do not exist on +the other system. When this happens, it is not enough to avoid +executing the invalid code. Its mere presence will cause the compiler +to reject the program. With a preprocessing conditional, the offending +code can be effectively excised from the program when it is not valid. + +@item +You may want to be able to compile the same source file into two +different programs. One version might make frequent time-consuming +consistency checks on its intermediate data, or print the values of +those data for debugging, and the other not. + +@item +A conditional whose condition is always false is one way to exclude code +from the program but keep it as a sort of comment for future reference. +@end itemize + +Simple programs that do not need system-specific logic or complex +debugging hooks generally will not need to use preprocessing +conditionals. + +@node Conditional Syntax +@section Conditional Syntax + +@findex #if +A conditional in the C preprocessor begins with a @dfn{conditional +directive}: @samp{#if}, @samp{#ifdef} or @samp{#ifndef}. + +@menu +* Ifdef:: +* If:: +* Defined:: +* Else:: +* Elif:: +* @code{__has_attribute}:: +* @code{__has_cpp_attribute}:: +* @code{__has_c_attribute}:: +* @code{__has_builtin}:: +* @code{__has_include}:: +@end menu + +@node Ifdef +@subsection Ifdef +@findex #ifdef +@findex #endif + +The simplest sort of conditional is + +@smallexample +@group +#ifdef @var{MACRO} + +@var{controlled text} + +#endif /* @var{MACRO} */ +@end group +@end smallexample + +@cindex conditional group +This block is called a @dfn{conditional group}. @var{controlled text} +will be included in the output of the preprocessor if and only if +@var{MACRO} is defined. We say that the conditional @dfn{succeeds} if +@var{MACRO} is defined, @dfn{fails} if it is not. + +The @var{controlled text} inside of a conditional can include +preprocessing directives. They are executed only if the conditional +succeeds. You can nest conditional groups inside other conditional +groups, but they must be completely nested. In other words, +@samp{#endif} always matches the nearest @samp{#ifdef} (or +@samp{#ifndef}, or @samp{#if}). Also, you cannot start a conditional +group in one file and end it in another. + +Even if a conditional fails, the @var{controlled text} inside it is +still run through initial transformations and tokenization. Therefore, +it must all be lexically valid C@. Normally the only way this matters is +that all comments and string literals inside a failing conditional group +must still be properly ended. + +The comment following the @samp{#endif} is not required, but it is a +good practice if there is a lot of @var{controlled text}, because it +helps people match the @samp{#endif} to the corresponding @samp{#ifdef}. +Older programs sometimes put @var{MACRO} directly after the +@samp{#endif} without enclosing it in a comment. This is invalid code +according to the C standard. CPP accepts it with a warning. It +never affects which @samp{#ifndef} the @samp{#endif} matches. + +@findex #ifndef +Sometimes you wish to use some code if a macro is @emph{not} defined. +You can do this by writing @samp{#ifndef} instead of @samp{#ifdef}. +One common use of @samp{#ifndef} is to include code only the first +time a header file is included. @xref{Once-Only Headers}. + +Macro definitions can vary between compilations for several reasons. +Here are some samples. + +@itemize @bullet +@item +Some macros are predefined on each kind of machine +(@pxref{System-specific Predefined Macros}). This allows you to provide +code specially tuned for a particular machine. + +@item +System header files define more macros, associated with the features +they implement. You can test these macros with conditionals to avoid +using a system feature on a machine where it is not implemented. + +@item +Macros can be defined or undefined with the @option{-D} and @option{-U} +command-line options when you compile the program. You can arrange to +compile the same source file into two different programs by choosing a +macro name to specify which program you want, writing conditionals to +test whether or how this macro is defined, and then controlling the +state of the macro with command-line options, perhaps set in the +Makefile. @xref{Invocation}. + +@item +Your program might have a special header file (often called +@file{config.h}) that is adjusted when the program is compiled. It can +define or not define macros depending on the features of the system and +the desired capabilities of the program. The adjustment can be +automated by a tool such as @command{autoconf}, or done by hand. +@end itemize + +@node If +@subsection If + +The @samp{#if} directive allows you to test the value of an arithmetic +expression, rather than the mere existence of one macro. Its syntax is + +@smallexample +@group +#if @var{expression} + +@var{controlled text} + +#endif /* @var{expression} */ +@end group +@end smallexample + +@var{expression} is a C expression of integer type, subject to stringent +restrictions. It may contain + +@itemize @bullet +@item +Integer constants. + +@item +Character constants, which are interpreted as they would be in normal +code. + +@item +Arithmetic operators for addition, subtraction, multiplication, +division, bitwise operations, shifts, comparisons, and logical +operations (@code{&&} and @code{||}). The latter two obey the usual +short-circuiting rules of standard C@. + +@item +Macros. All macros in the expression are expanded before actual +computation of the expression's value begins. + +@item +Uses of the @code{defined} operator, which lets you check whether macros +are defined in the middle of an @samp{#if}. + +@item +Identifiers that are not macros, which are all considered to be the +number zero. This allows you to write @code{@w{#if MACRO}} instead of +@code{@w{#ifdef MACRO}}, if you know that MACRO, when defined, will +always have a nonzero value. Function-like macros used without their +function call parentheses are also treated as zero. + +In some contexts this shortcut is undesirable. The @option{-Wundef} +option causes GCC to warn whenever it encounters an identifier which is +not a macro in an @samp{#if}. +@end itemize + +The preprocessor does not know anything about types in the language. +Therefore, @code{sizeof} operators are not recognized in @samp{#if}, and +neither are @code{enum} constants. They will be taken as identifiers +which are not macros, and replaced by zero. In the case of +@code{sizeof}, this is likely to cause the expression to be invalid. + +The preprocessor calculates the value of @var{expression}. It carries +out all calculations in the widest integer type known to the compiler; +on most machines supported by GCC this is 64 bits. This is not the same +rule as the compiler uses to calculate the value of a constant +expression, and may give different results in some cases. If the value +comes out to be nonzero, the @samp{#if} succeeds and the @var{controlled +text} is included; otherwise it is skipped. + +@node Defined +@subsection Defined + +@cindex @code{defined} +The special operator @code{defined} is used in @samp{#if} and +@samp{#elif} expressions to test whether a certain name is defined as a +macro. @code{defined @var{name}} and @code{defined (@var{name})} are +both expressions whose value is 1 if @var{name} is defined as a macro at +the current point in the program, and 0 otherwise. Thus, @code{@w{#if +defined MACRO}} is precisely equivalent to @code{@w{#ifdef MACRO}}. + +@code{defined} is useful when you wish to test more than one macro for +existence at once. For example, + +@smallexample +#if defined (__vax__) || defined (__ns16000__) +@end smallexample + +@noindent +would succeed if either of the names @code{__vax__} or +@code{__ns16000__} is defined as a macro. + +Conditionals written like this: + +@smallexample +#if defined BUFSIZE && BUFSIZE >= 1024 +@end smallexample + +@noindent +can generally be simplified to just @code{@w{#if BUFSIZE >= 1024}}, +since if @code{BUFSIZE} is not defined, it will be interpreted as having +the value zero. + +If the @code{defined} operator appears as a result of a macro expansion, +the C standard says the behavior is undefined. GNU cpp treats it as a +genuine @code{defined} operator and evaluates it normally. It will warn +wherever your code uses this feature if you use the command-line option +@option{-Wpedantic}, since other compilers may handle it differently. The +warning is also enabled by @option{-Wextra}, and can also be enabled +individually with @option{-Wexpansion-to-defined}. + +@node Else +@subsection Else + +@findex #else +The @samp{#else} directive can be added to a conditional to provide +alternative text to be used if the condition fails. This is what it +looks like: + +@smallexample +@group +#if @var{expression} +@var{text-if-true} +#else /* Not @var{expression} */ +@var{text-if-false} +#endif /* Not @var{expression} */ +@end group +@end smallexample + +@noindent +If @var{expression} is nonzero, the @var{text-if-true} is included and +the @var{text-if-false} is skipped. If @var{expression} is zero, the +opposite happens. + +You can use @samp{#else} with @samp{#ifdef} and @samp{#ifndef}, too. + +@node Elif +@subsection Elif + +@findex #elif +One common case of nested conditionals is used to check for more than two +possible alternatives. For example, you might have + +@smallexample +#if X == 1 +@dots{} +#else /* X != 1 */ +#if X == 2 +@dots{} +#else /* X != 2 */ +@dots{} +#endif /* X != 2 */ +#endif /* X != 1 */ +@end smallexample + +Another conditional directive, @samp{#elif}, allows this to be +abbreviated as follows: + +@smallexample +#if X == 1 +@dots{} +#elif X == 2 +@dots{} +#else /* X != 2 and X != 1*/ +@dots{} +#endif /* X != 2 and X != 1*/ +@end smallexample + +@samp{#elif} stands for ``else if''. Like @samp{#else}, it goes in the +middle of a conditional group and subdivides it; it does not require a +matching @samp{#endif} of its own. Like @samp{#if}, the @samp{#elif} +directive includes an expression to be tested. The text following the +@samp{#elif} is processed only if the original @samp{#if}-condition +failed and the @samp{#elif} condition succeeds. + +More than one @samp{#elif} can go in the same conditional group. Then +the text after each @samp{#elif} is processed only if the @samp{#elif} +condition succeeds after the original @samp{#if} and all previous +@samp{#elif} directives within it have failed. + +@samp{#else} is allowed after any number of @samp{#elif} directives, but +@samp{#elif} may not follow @samp{#else}. + +@node @code{__has_attribute} +@subsection @code{__has_attribute} +@cindex @code{__has_attribute} + +The special operator @code{__has_attribute (@var{operand})} may be used +in @samp{#if} and @samp{#elif} expressions to test whether the attribute +referenced by its @var{operand} is recognized by GCC. Using the operator +in other contexts is not valid. In C code, if compiling for strict +conformance to standards before C2x, @var{operand} must be +a valid identifier. Otherwise, @var{operand} may be optionally +introduced by the @code{@var{attribute-scope}::} prefix. +The @var{attribute-scope} prefix identifies the ``namespace'' within +which the attribute is recognized. The scope of GCC attributes is +@samp{gnu} or @samp{__gnu__}. The @code{__has_attribute} operator by +itself, without any @var{operand} or parentheses, acts as a predefined +macro so that support for it can be tested in portable code. Thus, +the recommended use of the operator is as follows: + +@smallexample +#if defined __has_attribute +# if __has_attribute (nonnull) +# define ATTR_NONNULL __attribute__ ((nonnull)) +# endif +#endif +@end smallexample + +The first @samp{#if} test succeeds only when the operator is supported +by the version of GCC (or another compiler) being used. Only when that +test succeeds is it valid to use @code{__has_attribute} as a preprocessor +operator. As a result, combining the two tests into a single expression as +shown below would only be valid with a compiler that supports the operator +but not with others that don't. + +@smallexample +#if defined __has_attribute && __has_attribute (nonnull) /* not portable */ +@dots{} +#endif +@end smallexample + +@node @code{__has_cpp_attribute} +@subsection @code{__has_cpp_attribute} +@cindex @code{__has_cpp_attribute} + +The special operator @code{__has_cpp_attribute (@var{operand})} may be used +in @samp{#if} and @samp{#elif} expressions in C++ code to test whether +the attribute referenced by its @var{operand} is recognized by GCC. +@code{__has_cpp_attribute (@var{operand})} is equivalent to +@code{__has_attribute (@var{operand})} except that when @var{operand} +designates a supported standard attribute it evaluates to an integer +constant of the form @code{YYYYMM} indicating the year and month when +the attribute was first introduced into the C++ standard. For additional +information including the dates of the introduction of current standard +attributes, see @w{@uref{https://isocpp.org/std/standing-documents/sd-6-sg10-feature-test-recommendations/, +SD-6: SG10 Feature Test Recommendations}}. + +@node @code{__has_c_attribute} +@subsection @code{__has_c_attribute} +@cindex @code{__has_c_attribute} + +The special operator @code{__has_c_attribute (@var{operand})} may be +used in @samp{#if} and @samp{#elif} expressions in C code to test +whether the attribute referenced by its @var{operand} is recognized by +GCC in attributes using the @samp{[[]]} syntax. GNU attributes must +be specified with the scope @samp{gnu} or @samp{__gnu__} with +@code{__has_c_attribute}. When @var{operand} designates a supported +standard attribute it evaluates to an integer constant of the form +@code{YYYYMM} indicating the year and month when the attribute was +first introduced into the C standard, or when the syntax of operands +to the attribute was extended in the C standard. + +@node @code{__has_builtin} +@subsection @code{__has_builtin} +@cindex @code{__has_builtin} + +The special operator @code{__has_builtin (@var{operand})} may be used in +constant integer contexts and in preprocessor @samp{#if} and @samp{#elif} +expressions to test whether the symbol named by its @var{operand} is +recognized as a built-in function by GCC in the current language and +conformance mode. It evaluates to a constant integer with a nonzero +value if the argument refers to such a function, and to zero otherwise. +The operator may also be used in preprocessor @samp{#if} and @samp{#elif} +expressions. The @code{__has_builtin} operator by itself, without any +@var{operand} or parentheses, acts as a predefined macro so that support +for it can be tested in portable code. Thus, the recommended use of +the operator is as follows: + +@smallexample +#if defined __has_builtin +# if __has_builtin (__builtin_object_size) +# define builtin_object_size(ptr) __builtin_object_size (ptr, 2) +# endif +#endif +#ifndef builtin_object_size +# define builtin_object_size(ptr) ((size_t)-1) +#endif +@end smallexample + +@node @code{__has_include} +@subsection @code{__has_include} +@cindex @code{__has_include} + +The special operator @code{__has_include (@var{operand})} may be used in +@samp{#if} and @samp{#elif} expressions to test whether the header referenced +by its @var{operand} can be included using the @samp{#include} directive. Using +the operator in other contexts is not valid. The @var{operand} takes +the same form as the file in the @samp{#include} directive (@pxref{Include +Syntax}) and evaluates to a nonzero value if the header can be included and +to zero otherwise. Note that that the ability to include a header doesn't +imply that the header doesn't contain invalid constructs or @samp{#error} +directives that would cause the preprocessor to fail. + +The @code{__has_include} operator by itself, without any @var{operand} or +parentheses, acts as a predefined macro so that support for it can be tested +in portable code. Thus, the recommended use of the operator is as follows: + +@smallexample +#if defined __has_include +# if __has_include () +# include +# endif +#endif +@end smallexample + +The first @samp{#if} test succeeds only when the operator is supported +by the version of GCC (or another compiler) being used. Only when that +test succeeds is it valid to use @code{__has_include} as a preprocessor +operator. As a result, combining the two tests into a single expression +as shown below would only be valid with a compiler that supports the operator +but not with others that don't. + +@smallexample +#if defined __has_include && __has_include ("header.h") /* not portable */ +@dots{} +#endif +@end smallexample + +@node Deleted Code +@section Deleted Code +@cindex commenting out code + +If you replace or delete a part of the program but want to keep the old +code around for future reference, you often cannot simply comment it +out. Block comments do not nest, so the first comment inside the old +code will end the commenting-out. The probable result is a flood of +syntax errors. + +One way to avoid this problem is to use an always-false conditional +instead. For instance, put @code{#if 0} before the deleted code and +@code{#endif} after it. This works even if the code being turned +off contains conditionals, but they must be entire conditionals +(balanced @samp{#if} and @samp{#endif}). + +Some people use @code{#ifdef notdef} instead. This is risky, because +@code{notdef} might be accidentally defined as a macro, and then the +conditional would succeed. @code{#if 0} can be counted on to fail. + +Do not use @code{#if 0} for comments which are not C code. Use a real +comment, instead. The interior of @code{#if 0} must consist of complete +tokens; in particular, single-quote characters must balance. Comments +often contain unbalanced single-quote characters (known in English as +apostrophes). These confuse @code{#if 0}. They don't confuse +@samp{/*}. + +@node Diagnostics +@chapter Diagnostics +@cindex diagnostic +@cindex reporting errors +@cindex reporting warnings + +@findex #error +The directive @samp{#error} causes the preprocessor to report a fatal +error. The tokens forming the rest of the line following @samp{#error} +are used as the error message. + +You would use @samp{#error} inside of a conditional that detects a +combination of parameters which you know the program does not properly +support. For example, if you know that the program will not run +properly on a VAX, you might write + +@smallexample +@group +#ifdef __vax__ +#error "Won't work on VAXen. See comments at get_last_object." +#endif +@end group +@end smallexample + +If you have several configuration parameters that must be set up by +the installation in a consistent way, you can use conditionals to detect +an inconsistency and report it with @samp{#error}. For example, + +@smallexample +#if !defined(FOO) && defined(BAR) +#error "BAR requires FOO." +#endif +@end smallexample + +@findex #warning +The directive @samp{#warning} is like @samp{#error}, but causes the +preprocessor to issue a warning and continue preprocessing. The tokens +following @samp{#warning} are used as the warning message. + +You might use @samp{#warning} in obsolete header files, with a message +directing the user to the header file which should be used instead. + +Neither @samp{#error} nor @samp{#warning} macro-expands its argument. +Internal whitespace sequences are each replaced with a single space. +The line must consist of complete tokens. It is wisest to make the +argument of these directives be a single string constant; this avoids +problems with apostrophes and the like. + +@node Line Control +@chapter Line Control +@cindex line control + +The C preprocessor informs the C compiler of the location in your source +code where each token came from. Presently, this is just the file name +and line number. All the tokens resulting from macro expansion are +reported as having appeared on the line of the source file where the +outermost macro was used. We intend to be more accurate in the future. + +If you write a program which generates source code, such as the +@command{bison} parser generator, you may want to adjust the preprocessor's +notion of the current file name and line number by hand. Parts of the +output from @command{bison} are generated from scratch, other parts come +from a standard parser file. The rest are copied verbatim from +@command{bison}'s input. You would like compiler error messages and +symbolic debuggers to be able to refer to @code{bison}'s input file. + +@findex #line +@command{bison} or any such program can arrange this by writing +@samp{#line} directives into the output file. @samp{#line} is a +directive that specifies the original line number and source file name +for subsequent input in the current preprocessor input file. +@samp{#line} has three variants: + +@table @code +@item #line @var{linenum} +@var{linenum} is a non-negative decimal integer constant. It specifies +the line number which should be reported for the following line of +input. Subsequent lines are counted from @var{linenum}. + +@item #line @var{linenum} @var{filename} +@var{linenum} is the same as for the first form, and has the same +effect. In addition, @var{filename} is a string constant. The +following line and all subsequent lines are reported to come from the +file it specifies, until something else happens to change that. +@var{filename} is interpreted according to the normal rules for a string +constant: backslash escapes are interpreted. This is different from +@samp{#include}. + +@item #line @var{anything else} +@var{anything else} is checked for macro calls, which are expanded. +The result should match one of the above two forms. +@end table + +@samp{#line} directives alter the results of the @code{__FILE__} and +@code{__LINE__} predefined macros from that point on. @xref{Standard +Predefined Macros}. They do not have any effect on @samp{#include}'s +idea of the directory containing the current file. + +@node Pragmas +@chapter Pragmas + +@cindex pragma directive + +The @samp{#pragma} directive is the method specified by the C standard +for providing additional information to the compiler, beyond what is +conveyed in the language itself. The forms of this directive +(commonly known as @dfn{pragmas}) specified by C standard are prefixed with +@code{STDC}. A C compiler is free to attach any meaning it likes to other +pragmas. Most GNU-defined, supported pragmas have been given a +@code{GCC} prefix. + +@cindex @code{_Pragma} +C99 introduced the @code{@w{_Pragma}} operator. This feature addresses a +major problem with @samp{#pragma}: being a directive, it cannot be +produced as the result of macro expansion. @code{@w{_Pragma}} is an +operator, much like @code{sizeof} or @code{defined}, and can be embedded +in a macro. + +Its syntax is @code{@w{_Pragma (@var{string-literal})}}, where +@var{string-literal} can be either a normal or wide-character string +literal. It is destringized, by replacing all @samp{\\} with a single +@samp{\} and all @samp{\"} with a @samp{"}. The result is then +processed as if it had appeared as the right hand side of a +@samp{#pragma} directive. For example, + +@smallexample +_Pragma ("GCC dependency \"parse.y\"") +@end smallexample + +@noindent +has the same effect as @code{#pragma GCC dependency "parse.y"}. The +same effect could be achieved using macros, for example + +@smallexample +#define DO_PRAGMA(x) _Pragma (#x) +DO_PRAGMA (GCC dependency "parse.y") +@end smallexample + +The standard is unclear on where a @code{_Pragma} operator can appear. +The preprocessor does not accept it within a preprocessing conditional +directive like @samp{#if}. To be safe, you are probably best keeping it +out of directives other than @samp{#define}, and putting it on a line of +its own. + +This manual documents the pragmas which are meaningful to the +preprocessor itself. Other pragmas are meaningful to the C or C++ +compilers. They are documented in the GCC manual. + +GCC plugins may provide their own pragmas. + +@ftable @code +@item #pragma GCC dependency +@code{#pragma GCC dependency} allows you to check the relative dates of +the current file and another file. If the other file is more recent than +the current file, a warning is issued. This is useful if the current +file is derived from the other file, and should be regenerated. The +other file is searched for using the normal include search path. +Optional trailing text can be used to give more information in the +warning message. + +@smallexample +#pragma GCC dependency "parse.y" +#pragma GCC dependency "/usr/include/time.h" rerun fixincludes +@end smallexample + +@item #pragma GCC poison +Sometimes, there is an identifier that you want to remove completely +from your program, and make sure that it never creeps back in. To +enforce this, you can @dfn{poison} the identifier with this pragma. +@code{#pragma GCC poison} is followed by a list of identifiers to +poison. If any of those identifiers appears anywhere in the source +after the directive, it is a hard error. For example, + +@smallexample +#pragma GCC poison printf sprintf fprintf +sprintf(some_string, "hello"); +@end smallexample + +@noindent +will produce an error. + +If a poisoned identifier appears as part of the expansion of a macro +which was defined before the identifier was poisoned, it will @emph{not} +cause an error. This lets you poison an identifier without worrying +about system headers defining macros that use it. + +For example, + +@smallexample +#define strrchr rindex +#pragma GCC poison rindex +strrchr(some_string, 'h'); +@end smallexample + +@noindent +will not produce an error. + +@item #pragma GCC system_header +This pragma takes no arguments. It causes the rest of the code in the +current file to be treated as if it came from a system header. +@xref{System Headers}. + +@item #pragma GCC warning +@itemx #pragma GCC error +@code{#pragma GCC warning "message"} causes the preprocessor to issue +a warning diagnostic with the text @samp{message}. The message +contained in the pragma must be a single string literal. Similarly, +@code{#pragma GCC error "message"} issues an error message. Unlike +the @samp{#warning} and @samp{#error} directives, these pragmas can be +embedded in preprocessor macros using @samp{_Pragma}. + +@item #pragma once +If @code{#pragma once} is seen when scanning a header file, that +file will never be read again, no matter what. It is a less-portable +alternative to using @samp{#ifndef} to guard the contents of header files +against multiple inclusions. + +@end ftable + +@node Other Directives +@chapter Other Directives + +@findex #ident +@findex #sccs +The @samp{#ident} directive takes one argument, a string constant. On +some systems, that string constant is copied into a special segment of +the object file. On other systems, the directive is ignored. The +@samp{#sccs} directive is a synonym for @samp{#ident}. + +These directives are not part of the C standard, but they are not +official GNU extensions either. What historical information we have +been able to find, suggests they originated with System V@. + +@cindex null directive +The @dfn{null directive} consists of a @samp{#} followed by a newline, +with only whitespace (including comments) in between. A null directive +is understood as a preprocessing directive but has no effect on the +preprocessor output. The primary significance of the existence of the +null directive is that an input line consisting of just a @samp{#} will +produce no output, rather than a line of output containing just a +@samp{#}. Supposedly some old C programs contain such lines. + +@node Preprocessor Output +@chapter Preprocessor Output + +When the C preprocessor is used with the C, C++, or Objective-C +compilers, it is integrated into the compiler and communicates a stream +of binary tokens directly to the compiler's parser. However, it can +also be used in the more conventional standalone mode, where it produces +textual output. +@c FIXME: Document the library interface. + +@cindex output format +The output from the C preprocessor looks much like the input, except +that all preprocessing directive lines have been replaced with blank +lines and all comments with spaces. Long runs of blank lines are +discarded. + +The ISO standard specifies that it is implementation defined whether a +preprocessor preserves whitespace between tokens, or replaces it with +e.g.@: a single space. In GNU CPP, whitespace between tokens is collapsed +to become a single space, with the exception that the first token on a +non-directive line is preceded with sufficient spaces that it appears in +the same column in the preprocessed output that it appeared in the +original source file. This is so the output is easy to read. +CPP does not insert any +whitespace where there was none in the original source, except where +necessary to prevent an accidental token paste. + +@cindex linemarkers +Source file name and line number information is conveyed by lines +of the form + +@smallexample +# @var{linenum} @var{filename} @var{flags} +@end smallexample + +@noindent +These are called @dfn{linemarkers}. They are inserted as needed into +the output (but never within a string or character constant). They mean +that the following line originated in file @var{filename} at line +@var{linenum}. @var{filename} will never contain any non-printing +characters; they are replaced with octal escape sequences. + +After the file name comes zero or more flags, which are @samp{1}, +@samp{2}, @samp{3}, or @samp{4}. If there are multiple flags, spaces +separate them. Here is what the flags mean: + +@table @samp +@item 1 +This indicates the start of a new file. +@item 2 +This indicates returning to a file (after having included another file). +@item 3 +This indicates that the following text comes from a system header file, +so certain warnings should be suppressed. +@item 4 +This indicates that the following text should be treated as being +wrapped in an implicit @code{extern "C"} block. +@c maybe cross reference SYSTEM_IMPLICIT_EXTERN_C +@end table + +As an extension, the preprocessor accepts linemarkers in non-assembler +input files. They are treated like the corresponding @samp{#line} +directive, (@pxref{Line Control}), except that trailing flags are +permitted, and are interpreted with the meanings described above. If +multiple flags are given, they must be in ascending order. + +Some directives may be duplicated in the output of the preprocessor. +These are @samp{#ident} (always), @samp{#pragma} (only if the +preprocessor does not handle the pragma itself), and @samp{#define} and +@samp{#undef} (with certain debugging options). If this happens, the +@samp{#} of the directive will always be in the first column, and there +will be no space between the @samp{#} and the directive name. If macro +expansion happens to generate tokens which might be mistaken for a +duplicated directive, a space will be inserted between the @samp{#} and +the directive name. + +@node Traditional Mode +@chapter Traditional Mode + +Traditional (pre-standard) C preprocessing is rather different from +the preprocessing specified by the standard. When the preprocessor +is invoked with the +@option{-traditional-cpp} option, it attempts to emulate a traditional +preprocessor. + +This mode is not useful for compiling C code with GCC, +but is intended for use with non-C preprocessing applications. Thus +traditional mode semantics are supported only when invoking +the preprocessor explicitly, and not in the compiler front ends. + +The implementation does not correspond precisely to the behavior of +early pre-standard versions of GCC, nor to any true traditional preprocessor. +After all, inconsistencies among traditional implementations were a +major motivation for C standardization. However, we intend that it +should be compatible with true traditional preprocessors in all ways +that actually matter. + +@menu +* Traditional lexical analysis:: +* Traditional macros:: +* Traditional miscellany:: +* Traditional warnings:: +@end menu + +@node Traditional lexical analysis +@section Traditional lexical analysis + +The traditional preprocessor does not decompose its input into tokens +the same way a standards-conforming preprocessor does. The input is +simply treated as a stream of text with minimal internal form. + +This implementation does not treat trigraphs (@pxref{trigraphs}) +specially since they were an invention of the standards committee. It +handles arbitrarily-positioned escaped newlines properly and splices +the lines as you would expect; many traditional preprocessors did not +do this. + +The form of horizontal whitespace in the input file is preserved in +the output. In particular, hard tabs remain hard tabs. This can be +useful if, for example, you are preprocessing a Makefile. + +Traditional CPP only recognizes C-style block comments, and treats the +@samp{/*} sequence as introducing a comment only if it lies outside +quoted text. Quoted text is introduced by the usual single and double +quotes, and also by an initial @samp{<} in a @code{#include} +directive. + +Traditionally, comments are completely removed and are not replaced +with a space. Since a traditional compiler does its own tokenization +of the output of the preprocessor, this means that comments can +effectively be used as token paste operators. However, comments +behave like separators for text handled by the preprocessor itself, +since it doesn't re-lex its input. For example, in + +@smallexample +#if foo/**/bar +@end smallexample + +@noindent +@samp{foo} and @samp{bar} are distinct identifiers and expanded +separately if they happen to be macros. In other words, this +directive is equivalent to + +@smallexample +#if foo bar +@end smallexample + +@noindent +rather than + +@smallexample +#if foobar +@end smallexample + +Generally speaking, in traditional mode an opening quote need not have +a matching closing quote. In particular, a macro may be defined with +replacement text that contains an unmatched quote. Of course, if you +attempt to compile preprocessed output containing an unmatched quote +you will get a syntax error. + +However, all preprocessing directives other than @code{#define} +require matching quotes. For example: + +@smallexample +#define m This macro's fine and has an unmatched quote +"/* This is not a comment. */ +/* @r{This is a comment. The following #include directive + is ill-formed.} */ +#include }} directives. + +You can specify any number or combination of these options on the +command line to search for header files in several directories. +The lookup order is as follows: + +@enumerate +@item +For the quote form of the include directive, the directory of the current +file is searched first. + +@item +For the quote form of the include directive, the directories specified +by @option{-iquote} options are searched in left-to-right order, +as they appear on the command line. + +@item +Directories specified with @option{-I} options are scanned in +left-to-right order. + +@item +Directories specified with @option{-isystem} options are scanned in +left-to-right order. + +@item +Standard system directories are scanned. + +@item +Directories specified with @option{-idirafter} options are scanned in +left-to-right order. +@end enumerate + +You can use @option{-I} to override a system header +file, substituting your own version, since these directories are +searched before the standard system header file directories. +However, you should +not use this option to add directories that contain vendor-supplied +system header files; use @option{-isystem} for that. + +The @option{-isystem} and @option{-idirafter} options also mark the directory +as a system directory, so that it gets the same special treatment that +is applied to the standard system directories. +@ifset cppmanual +@xref{System Headers}. +@end ifset + +If a standard system include directory, or a directory specified with +@option{-isystem}, is also specified with @option{-I}, the @option{-I} +option is ignored. The directory is still searched but as a +system directory at its normal position in the system include chain. +This is to ensure that GCC's procedure to fix buggy system headers and +the ordering for the @code{#include_next} directive are not inadvertently +changed. +If you really need to change the search order for system directories, +use the @option{-nostdinc} and/or @option{-isystem} options. +@ifset cppmanual +@xref{System Headers}. +@end ifset + +@item -I- +@opindex I- +Split the include path. +This option has been deprecated. Please use @option{-iquote} instead for +@option{-I} directories before the @option{-I-} and remove the @option{-I-} +option. + +Any directories specified with @option{-I} +options before @option{-I-} are searched only for headers requested with +@code{@w{#include "@var{file}"}}; they are not searched for +@code{@w{#include <@var{file}>}}. If additional directories are +specified with @option{-I} options after the @option{-I-}, those +directories are searched for all @samp{#include} directives. + +In addition, @option{-I-} inhibits the use of the directory of the current +file directory as the first search directory for @code{@w{#include +"@var{file}"}}. There is no way to override this effect of @option{-I-}. +@ifset cppmanual +@xref{Search Path}. +@end ifset + +@item -iprefix @var{prefix} +@opindex iprefix +Specify @var{prefix} as the prefix for subsequent @option{-iwithprefix} +options. If the prefix represents a directory, you should include the +final @samp{/}. + +@item -iwithprefix @var{dir} +@itemx -iwithprefixbefore @var{dir} +@opindex iwithprefix +@opindex iwithprefixbefore +Append @var{dir} to the prefix specified previously with +@option{-iprefix}, and add the resulting directory to the include search +path. @option{-iwithprefixbefore} puts it in the same place @option{-I} +would; @option{-iwithprefix} puts it where @option{-idirafter} would. + +@item -isysroot @var{dir} +@opindex isysroot +This option is like the @option{--sysroot} option, but applies only to +header files (except for Darwin targets, where it applies to both header +files and libraries). See the @option{--sysroot} option for more +information. + +@item -imultilib @var{dir} +@opindex imultilib +Use @var{dir} as a subdirectory of the directory containing +target-specific C++ headers. + +@item -nostdinc +@opindex nostdinc +Do not search the standard system directories for header files. +Only the directories explicitly specified with @option{-I}, +@option{-iquote}, @option{-isystem}, and/or @option{-idirafter} +options (and the directory of the current file, if appropriate) +are searched. + +@item -nostdinc++ +@opindex nostdinc++ +Do not search for header files in the C++-specific standard directories, +but do still search the other standard directories. (This option is +used when building the C++ library.) + diff --git a/gcc/doc/cppenv.texi b/gcc/doc/cppenv.texi new file mode 100644 index 00000000000..c8125bd34fe --- /dev/null +++ b/gcc/doc/cppenv.texi @@ -0,0 +1,99 @@ +@c Copyright (C) 1999-2022 Free Software Foundation, Inc. +@c This is part of the CPP and GCC manuals. +@c For copying conditions, see the file gcc.texi. + +@c --------------------------------------------------------------------- +@c Environment variables affecting the preprocessor +@c --------------------------------------------------------------------- + +@c If this file is included with the flag ``cppmanual'' set, it is +@c formatted for inclusion in the CPP manual; otherwise the main GCC manual. + +@vtable @env +@item CPATH +@itemx C_INCLUDE_PATH +@itemx CPLUS_INCLUDE_PATH +@itemx OBJC_INCLUDE_PATH +@c Commented out until ObjC++ is part of GCC: +@c @itemx OBJCPLUS_INCLUDE_PATH +Each variable's value is a list of directories separated by a special +character, much like @env{PATH}, in which to look for header files. +The special character, @code{PATH_SEPARATOR}, is target-dependent and +determined at GCC build time. For Microsoft Windows-based targets it is a +semicolon, and for almost all other targets it is a colon. + +@env{CPATH} specifies a list of directories to be searched as if +specified with @option{-I}, but after any paths given with @option{-I} +options on the command line. This environment variable is used +regardless of which language is being preprocessed. + +The remaining environment variables apply only when preprocessing the +particular language indicated. Each specifies a list of directories +to be searched as if specified with @option{-isystem}, but after any +paths given with @option{-isystem} options on the command line. + +In all these variables, an empty element instructs the compiler to +search its current working directory. Empty elements can appear at the +beginning or end of a path. For instance, if the value of +@env{CPATH} is @code{:/special/include}, that has the same +effect as @samp{@w{-I. -I/special/include}}. + +@c man end +@ifset cppmanual +See also @ref{Search Path}. +@end ifset +@c man begin ENVIRONMENT + +@item DEPENDENCIES_OUTPUT +@cindex dependencies for make as output +If this variable is set, its value specifies how to output +dependencies for Make based on the non-system header files processed +by the compiler. System header files are ignored in the dependency +output. + +The value of @env{DEPENDENCIES_OUTPUT} can be just a file name, in +which case the Make rules are written to that file, guessing the target +name from the source file name. Or the value can have the form +@samp{@var{file} @var{target}}, in which case the rules are written to +file @var{file} using @var{target} as the target name. + +In other words, this environment variable is equivalent to combining +the options @option{-MM} and @option{-MF} +@ifset cppmanual +(@pxref{Invocation}), +@end ifset +@ifclear cppmanual +(@pxref{Preprocessor Options}), +@end ifclear +with an optional @option{-MT} switch too. + +@item SUNPRO_DEPENDENCIES +@cindex dependencies for make as output +This variable is the same as @env{DEPENDENCIES_OUTPUT} (see above), +except that system header files are not ignored, so it implies +@option{-M} rather than @option{-MM}. However, the dependence on the +main input file is omitted. +@ifset cppmanual +@xref{Invocation}. +@end ifset +@ifclear cppmanual +@xref{Preprocessor Options}. +@end ifclear + +@item SOURCE_DATE_EPOCH +If this variable is set, its value specifies a UNIX timestamp to be +used in replacement of the current date and time in the @code{__DATE__} +and @code{__TIME__} macros, so that the embedded timestamps become +reproducible. + +The value of @env{SOURCE_DATE_EPOCH} must be a UNIX timestamp, +defined as the number of seconds (excluding leap seconds) since +01 Jan 1970 00:00:00 represented in ASCII; identical to the output of +@code{date +%s} on GNU/Linux and other systems that support the +@code{%s} extension in the @code{date} command. + +The value should be a known timestamp such as the last modification +time of the source or package and it should be set by the build +process. + +@end vtable diff --git a/gcc/doc/cppinternals.texi b/gcc/doc/cppinternals.texi new file mode 100644 index 00000000000..75adbbe7bec --- /dev/null +++ b/gcc/doc/cppinternals.texi @@ -0,0 +1,1066 @@ +\input texinfo +@setfilename cppinternals.info +@settitle The GNU C Preprocessor Internals + +@include gcc-common.texi + +@ifinfo +@dircategory Software development +@direntry +* Cpplib: (cppinternals). Cpplib internals. +@end direntry +@end ifinfo + +@c @smallbook +@c @cropmarks +@c @finalout +@setchapternewpage odd +@ifinfo +This file documents the internals of the GNU C Preprocessor. + +Copyright (C) 2000-2022 Free Software Foundation, Inc. + +Permission is granted to make and distribute verbatim copies of +this manual provided the copyright notice and this permission notice +are preserved on all copies. + +@ignore +Permission is granted to process this file through Tex and print the +results, provided the printed document carries copying permission +notice identical to this one except for the removal of this paragraph +(this paragraph not being relevant to the printed manual). + +@end ignore +Permission is granted to copy and distribute modified versions of this +manual under the conditions for verbatim copying, provided also that +the entire resulting derived work is distributed under the terms of a +permission notice identical to this one. + +Permission is granted to copy and distribute translations of this manual +into another language, under the above conditions for modified versions. +@end ifinfo + +@titlepage +@title Cpplib Internals +@versionsubtitle +@author Neil Booth +@page +@vskip 0pt plus 1filll +@c man begin COPYRIGHT +Copyright @copyright{} 2000-2022 Free Software Foundation, Inc. + +Permission is granted to make and distribute verbatim copies of +this manual provided the copyright notice and this permission notice +are preserved on all copies. + +Permission is granted to copy and distribute modified versions of this +manual under the conditions for verbatim copying, provided also that +the entire resulting derived work is distributed under the terms of a +permission notice identical to this one. + +Permission is granted to copy and distribute translations of this manual +into another language, under the above conditions for modified versions. +@c man end +@end titlepage +@contents +@page + +@ifnottex +@node Top +@top +@chapter Cpplib---the GNU C Preprocessor + +The GNU C preprocessor is +implemented as a library, @dfn{cpplib}, so it can be easily shared between +a stand-alone preprocessor, and a preprocessor integrated with the C, +C++ and Objective-C front ends. It is also available for use by other +programs, though this is not recommended as its exposed interface has +not yet reached a point of reasonable stability. + +The library has been written to be re-entrant, so that it can be used +to preprocess many files simultaneously if necessary. It has also been +written with the preprocessing token as the fundamental unit; the +preprocessor in previous versions of GCC would operate on text strings +as the fundamental unit. + +This brief manual documents the internals of cpplib, and explains some +of the tricky issues. It is intended that, along with the comments in +the source code, a reasonably competent C programmer should be able to +figure out what the code is doing, and why things have been implemented +the way they have. + +@menu +* Conventions:: Conventions used in the code. +* Lexer:: The combined C, C++ and Objective-C Lexer. +* Hash Nodes:: All identifiers are entered into a hash table. +* Macro Expansion:: Macro expansion algorithm. +* Token Spacing:: Spacing and paste avoidance issues. +* Line Numbering:: Tracking location within files. +* Guard Macros:: Optimizing header files with guard macros. +* Files:: File handling. +* Concept Index:: Index. +@end menu +@end ifnottex + +@node Conventions +@unnumbered Conventions +@cindex interface +@cindex header files + +cpplib has two interfaces---one is exposed internally only, and the +other is for both internal and external use. + +The convention is that functions and types that are exposed to multiple +files internally are prefixed with @samp{_cpp_}, and are to be found in +the file @file{internal.h}. Functions and types exposed to external +clients are in @file{cpplib.h}, and prefixed with @samp{cpp_}. For +historical reasons this is no longer quite true, but we should strive to +stick to it. + +We are striving to reduce the information exposed in @file{cpplib.h} to the +bare minimum necessary, and then to keep it there. This makes clear +exactly what external clients are entitled to assume, and allows us to +change internals in the future without worrying whether library clients +are perhaps relying on some kind of undocumented implementation-specific +behavior. + +@node Lexer +@unnumbered The Lexer +@cindex lexer +@cindex newlines +@cindex escaped newlines + +@section Overview +The lexer is contained in the file @file{lex.cc}. It is a hand-coded +lexer, and not implemented as a state machine. It can understand C, C++ +and Objective-C source code, and has been extended to allow reasonably +successful preprocessing of assembly language. The lexer does not make +an initial pass to strip out trigraphs and escaped newlines, but handles +them as they are encountered in a single pass of the input file. It +returns preprocessing tokens individually, not a line at a time. + +It is mostly transparent to users of the library, since the library's +interface for obtaining the next token, @code{cpp_get_token}, takes care +of lexing new tokens, handling directives, and expanding macros as +necessary. However, the lexer does expose some functionality so that +clients of the library can easily spell a given token, such as +@code{cpp_spell_token} and @code{cpp_token_len}. These functions are +useful when generating diagnostics, and for emitting the preprocessed +output. + +@section Lexing a token +Lexing of an individual token is handled by @code{_cpp_lex_direct} and +its subroutines. In its current form the code is quite complicated, +with read ahead characters and such-like, since it strives to not step +back in the character stream in preparation for handling non-ASCII file +encodings. The current plan is to convert any such files to UTF-8 +before processing them. This complexity is therefore unnecessary and +will be removed, so I'll not discuss it further here. + +The job of @code{_cpp_lex_direct} is simply to lex a token. It is not +responsible for issues like directive handling, returning lookahead +tokens directly, multiple-include optimization, or conditional block +skipping. It necessarily has a minor r@^ole to play in memory +management of lexed lines. I discuss these issues in a separate section +(@pxref{Lexing a line}). + +The lexer places the token it lexes into storage pointed to by the +variable @code{cur_token}, and then increments it. This variable is +important for correct diagnostic positioning. Unless a specific line +and column are passed to the diagnostic routines, they will examine the +@code{line} and @code{col} values of the token just before the location +that @code{cur_token} points to, and use that location to report the +diagnostic. + +The lexer does not consider whitespace to be a token in its own right. +If whitespace (other than a new line) precedes a token, it sets the +@code{PREV_WHITE} bit in the token's flags. Each token has its +@code{line} and @code{col} variables set to the line and column of the +first character of the token. This line number is the line number in +the translation unit, and can be converted to a source (file, line) pair +using the line map code. + +The first token on a logical, i.e.@: unescaped, line has the flag +@code{BOL} set for beginning-of-line. This flag is intended for +internal use, both to distinguish a @samp{#} that begins a directive +from one that doesn't, and to generate a call-back to clients that want +to be notified about the start of every non-directive line with tokens +on it. Clients cannot reliably determine this for themselves: the first +token might be a macro, and the tokens of a macro expansion do not have +the @code{BOL} flag set. The macro expansion may even be empty, and the +next token on the line certainly won't have the @code{BOL} flag set. + +New lines are treated specially; exactly how the lexer handles them is +context-dependent. The C standard mandates that directives are +terminated by the first unescaped newline character, even if it appears +in the middle of a macro expansion. Therefore, if the state variable +@code{in_directive} is set, the lexer returns a @code{CPP_EOF} token, +which is normally used to indicate end-of-file, to indicate +end-of-directive. In a directive a @code{CPP_EOF} token never means +end-of-file. Conveniently, if the caller was @code{collect_args}, it +already handles @code{CPP_EOF} as if it were end-of-file, and reports an +error about an unterminated macro argument list. + +The C standard also specifies that a new line in the middle of the +arguments to a macro is treated as whitespace. This white space is +important in case the macro argument is stringized. The state variable +@code{parsing_args} is nonzero when the preprocessor is collecting the +arguments to a macro call. It is set to 1 when looking for the opening +parenthesis to a function-like macro, and 2 when collecting the actual +arguments up to the closing parenthesis, since these two cases need to +be distinguished sometimes. One such time is here: the lexer sets the +@code{PREV_WHITE} flag of a token if it meets a new line when +@code{parsing_args} is set to 2. It doesn't set it if it meets a new +line when @code{parsing_args} is 1, since then code like + +@smallexample +#define foo() bar +foo +baz +@end smallexample + +@noindent would be output with an erroneous space before @samp{baz}: + +@smallexample +foo + baz +@end smallexample + +This is a good example of the subtlety of getting token spacing correct +in the preprocessor; there are plenty of tests in the testsuite for +corner cases like this. + +The lexer is written to treat each of @samp{\r}, @samp{\n}, @samp{\r\n} +and @samp{\n\r} as a single new line indicator. This allows it to +transparently preprocess MS-DOS, Macintosh and Unix files without their +needing to pass through a special filter beforehand. + +We also decided to treat a backslash, either @samp{\} or the trigraph +@samp{??/}, separated from one of the above newline indicators by +non-comment whitespace only, as intending to escape the newline. It +tends to be a typing mistake, and cannot reasonably be mistaken for +anything else in any of the C-family grammars. Since handling it this +way is not strictly conforming to the ISO standard, the library issues a +warning wherever it encounters it. + +Handling newlines like this is made simpler by doing it in one place +only. The function @code{handle_newline} takes care of all newline +characters, and @code{skip_escaped_newlines} takes care of arbitrarily +long sequences of escaped newlines, deferring to @code{handle_newline} +to handle the newlines themselves. + +The most painful aspect of lexing ISO-standard C and C++ is handling +trigraphs and backlash-escaped newlines. Trigraphs are processed before +any interpretation of the meaning of a character is made, and unfortunately +there is a trigraph representation for a backslash, so it is possible for +the trigraph @samp{??/} to introduce an escaped newline. + +Escaped newlines are tedious because theoretically they can occur +anywhere---between the @samp{+} and @samp{=} of the @samp{+=} token, +within the characters of an identifier, and even between the @samp{*} +and @samp{/} that terminates a comment. Moreover, you cannot be sure +there is just one---there might be an arbitrarily long sequence of them. + +So, for example, the routine that lexes a number, @code{parse_number}, +cannot assume that it can scan forwards until the first non-number +character and be done with it, because this could be the @samp{\} +introducing an escaped newline, or the @samp{?} introducing the trigraph +sequence that represents the @samp{\} of an escaped newline. If it +encounters a @samp{?} or @samp{\}, it calls @code{skip_escaped_newlines} +to skip over any potential escaped newlines before checking whether the +number has been finished. + +Similarly code in the main body of @code{_cpp_lex_direct} cannot simply +check for a @samp{=} after a @samp{+} character to determine whether it +has a @samp{+=} token; it needs to be prepared for an escaped newline of +some sort. Such cases use the function @code{get_effective_char}, which +returns the first character after any intervening escaped newlines. + +The lexer needs to keep track of the correct column position, including +counting tabs as specified by the @option{-ftabstop=} option. This +should be done even within C-style comments; they can appear in the +middle of a line, and we want to report diagnostics in the correct +position for text appearing after the end of the comment. + +@anchor{Invalid identifiers} +Some identifiers, such as @code{__VA_ARGS__} and poisoned identifiers, +may be invalid and require a diagnostic. However, if they appear in a +macro expansion we don't want to complain with each use of the macro. +It is therefore best to catch them during the lexing stage, in +@code{parse_identifier}. In both cases, whether a diagnostic is needed +or not is dependent upon the lexer's state. For example, we don't want +to issue a diagnostic for re-poisoning a poisoned identifier, or for +using @code{__VA_ARGS__} in the expansion of a variable-argument macro. +Therefore @code{parse_identifier} makes use of state flags to determine +whether a diagnostic is appropriate. Since we change state on a +per-token basis, and don't lex whole lines at a time, this is not a +problem. + +Another place where state flags are used to change behavior is whilst +lexing header names. Normally, a @samp{<} would be lexed as a single +token. After a @code{#include} directive, though, it should be lexed as +a single token as far as the nearest @samp{>} character. Note that we +don't allow the terminators of header names to be escaped; the first +@samp{"} or @samp{>} terminates the header name. + +Interpretation of some character sequences depends upon whether we are +lexing C, C++ or Objective-C, and on the revision of the standard in +force. For example, @samp{::} is a single token in C++, but in C it is +two separate @samp{:} tokens and almost certainly a syntax error. Such +cases are handled by @code{_cpp_lex_direct} based upon command-line +flags stored in the @code{cpp_options} structure. + +Once a token has been lexed, it leads an independent existence. The +spelling of numbers, identifiers and strings is copied to permanent +storage from the original input buffer, so a token remains valid and +correct even if its source buffer is freed with @code{_cpp_pop_buffer}. +The storage holding the spellings of such tokens remains until the +client program calls cpp_destroy, probably at the end of the translation +unit. + +@anchor{Lexing a line} +@section Lexing a line +@cindex token run + +When the preprocessor was changed to return pointers to tokens, one +feature I wanted was some sort of guarantee regarding how long a +returned pointer remains valid. This is important to the stand-alone +preprocessor, the future direction of the C family front ends, and even +to cpplib itself internally. + +Occasionally the preprocessor wants to be able to peek ahead in the +token stream. For example, after the name of a function-like macro, it +wants to check the next token to see if it is an opening parenthesis. +Another example is that, after reading the first few tokens of a +@code{#pragma} directive and not recognizing it as a registered pragma, +it wants to backtrack and allow the user-defined handler for unknown +pragmas to access the full @code{#pragma} token stream. The stand-alone +preprocessor wants to be able to test the current token with the +previous one to see if a space needs to be inserted to preserve their +separate tokenization upon re-lexing (paste avoidance), so it needs to +be sure the pointer to the previous token is still valid. The +recursive-descent C++ parser wants to be able to perform tentative +parsing arbitrarily far ahead in the token stream, and then to be able +to jump back to a prior position in that stream if necessary. + +The rule I chose, which is fairly natural, is to arrange that the +preprocessor lex all tokens on a line consecutively into a token buffer, +which I call a @dfn{token run}, and when meeting an unescaped new line +(newlines within comments do not count either), to start lexing back at +the beginning of the run. Note that we do @emph{not} lex a line of +tokens at once; if we did that @code{parse_identifier} would not have +state flags available to warn about invalid identifiers (@pxref{Invalid +identifiers}). + +In other words, accessing tokens that appeared earlier in the current +line is valid, but since each logical line overwrites the tokens of the +previous line, tokens from prior lines are unavailable. In particular, +since a directive only occupies a single logical line, this means that +the directive handlers like the @code{#pragma} handler can jump around +in the directive's tokens if necessary. + +Two issues remain: what about tokens that arise from macro expansions, +and what happens when we have a long line that overflows the token run? + +Since we promise clients that we preserve the validity of pointers that +we have already returned for tokens that appeared earlier in the line, +we cannot reallocate the run. Instead, on overflow it is expanded by +chaining a new token run on to the end of the existing one. + +The tokens forming a macro's replacement list are collected by the +@code{#define} handler, and placed in storage that is only freed by +@code{cpp_destroy}. So if a macro is expanded in the line of tokens, +the pointers to the tokens of its expansion that are returned will always +remain valid. However, macros are a little trickier than that, since +they give rise to three sources of fresh tokens. They are the built-in +macros like @code{__LINE__}, and the @samp{#} and @samp{##} operators +for stringizing and token pasting. I handled this by allocating +space for these tokens from the lexer's token run chain. This means +they automatically receive the same lifetime guarantees as lexed tokens, +and we don't need to concern ourselves with freeing them. + +Lexing into a line of tokens solves some of the token memory management +issues, but not all. The opening parenthesis after a function-like +macro name might lie on a different line, and the front ends definitely +want the ability to look ahead past the end of the current line. So +cpplib only moves back to the start of the token run at the end of a +line if the variable @code{keep_tokens} is zero. Line-buffering is +quite natural for the preprocessor, and as a result the only time cpplib +needs to increment this variable is whilst looking for the opening +parenthesis to, and reading the arguments of, a function-like macro. In +the near future cpplib will export an interface to increment and +decrement this variable, so that clients can share full control over the +lifetime of token pointers too. + +The routine @code{_cpp_lex_token} handles moving to new token runs, +calling @code{_cpp_lex_direct} to lex new tokens, or returning +previously-lexed tokens if we stepped back in the token stream. It also +checks each token for the @code{BOL} flag, which might indicate a +directive that needs to be handled, or require a start-of-line call-back +to be made. @code{_cpp_lex_token} also handles skipping over tokens in +failed conditional blocks, and invalidates the control macro of the +multiple-include optimization if a token was successfully lexed outside +a directive. In other words, its callers do not need to concern +themselves with such issues. + +@node Hash Nodes +@unnumbered Hash Nodes +@cindex hash table +@cindex identifiers +@cindex macros +@cindex assertions +@cindex named operators + +When cpplib encounters an ``identifier'', it generates a hash code for +it and stores it in the hash table. By ``identifier'' we mean tokens +with type @code{CPP_NAME}; this includes identifiers in the usual C +sense, as well as keywords, directive names, macro names and so on. For +example, all of @code{pragma}, @code{int}, @code{foo} and +@code{__GNUC__} are identifiers and hashed when lexed. + +Each node in the hash table contain various information about the +identifier it represents. For example, its length and type. At any one +time, each identifier falls into exactly one of three categories: + +@itemize @bullet +@item Macros + +These have been declared to be macros, either on the command line or +with @code{#define}. A few, such as @code{__TIME__} are built-ins +entered in the hash table during initialization. The hash node for a +normal macro points to a structure with more information about the +macro, such as whether it is function-like, how many arguments it takes, +and its expansion. Built-in macros are flagged as special, and instead +contain an enum indicating which of the various built-in macros it is. + +@item Assertions + +Assertions are in a separate namespace to macros. To enforce this, cpp +actually prepends a @code{#} character before hashing and entering it in +the hash table. An assertion's node points to a chain of answers to +that assertion. + +@item Void + +Everything else falls into this category---an identifier that is not +currently a macro, or a macro that has since been undefined with +@code{#undef}. + +When preprocessing C++, this category also includes the named operators, +such as @code{xor}. In expressions these behave like the operators they +represent, but in contexts where the spelling of a token matters they +are spelt differently. This spelling distinction is relevant when they +are operands of the stringizing and pasting macro operators @code{#} and +@code{##}. Named operator hash nodes are flagged, both to catch the +spelling distinction and to prevent them from being defined as macros. +@end itemize + +The same identifiers share the same hash node. Since each identifier +token, after lexing, contains a pointer to its hash node, this is used +to provide rapid lookup of various information. For example, when +parsing a @code{#define} statement, CPP flags each argument's identifier +hash node with the index of that argument. This makes duplicated +argument checking an O(1) operation for each argument. Similarly, for +each identifier in the macro's expansion, lookup to see if it is an +argument, and which argument it is, is also an O(1) operation. Further, +each directive name, such as @code{endif}, has an associated directive +enum stored in its hash node, so that directive lookup is also O(1). + +@node Macro Expansion +@unnumbered Macro Expansion Algorithm +@cindex macro expansion + +Macro expansion is a tricky operation, fraught with nasty corner cases +and situations that render what you thought was a nifty way to +optimize the preprocessor's expansion algorithm wrong in quite subtle +ways. + +I strongly recommend you have a good grasp of how the C and C++ +standards require macros to be expanded before diving into this +section, let alone the code!. If you don't have a clear mental +picture of how things like nested macro expansion, stringizing and +token pasting are supposed to work, damage to your sanity can quickly +result. + +@section Internal representation of macros +@cindex macro representation (internal) + +The preprocessor stores macro expansions in tokenized form. This +saves repeated lexing passes during expansion, at the cost of a small +increase in memory consumption on average. The tokens are stored +contiguously in memory, so a pointer to the first one and a token +count is all you need to get the replacement list of a macro. + +If the macro is a function-like macro the preprocessor also stores its +parameters, in the form of an ordered list of pointers to the hash +table entry of each parameter's identifier. Further, in the macro's +stored expansion each occurrence of a parameter is replaced with a +special token of type @code{CPP_MACRO_ARG}. Each such token holds the +index of the parameter it represents in the parameter list, which +allows rapid replacement of parameters with their arguments during +expansion. Despite this optimization it is still necessary to store +the original parameters to the macro, both for dumping with e.g., +@option{-dD}, and to warn about non-trivial macro redefinitions when +the parameter names have changed. + +@section Macro expansion overview +The preprocessor maintains a @dfn{context stack}, implemented as a +linked list of @code{cpp_context} structures, which together represent +the macro expansion state at any one time. The @code{struct +cpp_reader} member variable @code{context} points to the current top +of this stack. The top normally holds the unexpanded replacement list +of the innermost macro under expansion, except when cpplib is about to +pre-expand an argument, in which case it holds that argument's +unexpanded tokens. + +When there are no macros under expansion, cpplib is in @dfn{base +context}. All contexts other than the base context contain a +contiguous list of tokens delimited by a starting and ending token. +When not in base context, cpplib obtains the next token from the list +of the top context. If there are no tokens left in the list, it pops +that context off the stack, and subsequent ones if necessary, until an +unexhausted context is found or it returns to base context. In base +context, cpplib reads tokens directly from the lexer. + +If it encounters an identifier that is both a macro and enabled for +expansion, cpplib prepares to push a new context for that macro on the +stack by calling the routine @code{enter_macro_context}. When this +routine returns, the new context will contain the unexpanded tokens of +the replacement list of that macro. In the case of function-like +macros, @code{enter_macro_context} also replaces any parameters in the +replacement list, stored as @code{CPP_MACRO_ARG} tokens, with the +appropriate macro argument. If the standard requires that the +parameter be replaced with its expanded argument, the argument will +have been fully macro expanded first. + +@code{enter_macro_context} also handles special macros like +@code{__LINE__}. Although these macros expand to a single token which +cannot contain any further macros, for reasons of token spacing +(@pxref{Token Spacing}) and simplicity of implementation, cpplib +handles these special macros by pushing a context containing just that +one token. + +The final thing that @code{enter_macro_context} does before returning +is to mark the macro disabled for expansion (except for special macros +like @code{__TIME__}). The macro is re-enabled when its context is +later popped from the context stack, as described above. This strict +ordering ensures that a macro is disabled whilst its expansion is +being scanned, but that it is @emph{not} disabled whilst any arguments +to it are being expanded. + +@section Scanning the replacement list for macros to expand +The C standard states that, after any parameters have been replaced +with their possibly-expanded arguments, the replacement list is +scanned for nested macros. Further, any identifiers in the +replacement list that are not expanded during this scan are never +again eligible for expansion in the future, if the reason they were +not expanded is that the macro in question was disabled. + +Clearly this latter condition can only apply to tokens resulting from +argument pre-expansion. Other tokens never have an opportunity to be +re-tested for expansion. It is possible for identifiers that are +function-like macros to not expand initially but to expand during a +later scan. This occurs when the identifier is the last token of an +argument (and therefore originally followed by a comma or a closing +parenthesis in its macro's argument list), and when it replaces its +parameter in the macro's replacement list, the subsequent token +happens to be an opening parenthesis (itself possibly the first token +of an argument). + +It is important to note that when cpplib reads the last token of a +given context, that context still remains on the stack. Only when +looking for the @emph{next} token do we pop it off the stack and drop +to a lower context. This makes backing up by one token easy, but more +importantly ensures that the macro corresponding to the current +context is still disabled when we are considering the last token of +its replacement list for expansion (or indeed expanding it). As an +example, which illustrates many of the points above, consider + +@smallexample +#define foo(x) bar x +foo(foo) (2) +@end smallexample + +@noindent which fully expands to @samp{bar foo (2)}. During pre-expansion +of the argument, @samp{foo} does not expand even though the macro is +enabled, since it has no following parenthesis [pre-expansion of an +argument only uses tokens from that argument; it cannot take tokens +from whatever follows the macro invocation]. This still leaves the +argument token @samp{foo} eligible for future expansion. Then, when +re-scanning after argument replacement, the token @samp{foo} is +rejected for expansion, and marked ineligible for future expansion, +since the macro is now disabled. It is disabled because the +replacement list @samp{bar foo} of the macro is still on the context +stack. + +If instead the algorithm looked for an opening parenthesis first and +then tested whether the macro were disabled it would be subtly wrong. +In the example above, the replacement list of @samp{foo} would be +popped in the process of finding the parenthesis, re-enabling +@samp{foo} and expanding it a second time. + +@section Looking for a function-like macro's opening parenthesis +Function-like macros only expand when immediately followed by a +parenthesis. To do this cpplib needs to temporarily disable macros +and read the next token. Unfortunately, because of spacing issues +(@pxref{Token Spacing}), there can be fake padding tokens in-between, +and if the next real token is not a parenthesis cpplib needs to be +able to back up that one token as well as retain the information in +any intervening padding tokens. + +Backing up more than one token when macros are involved is not +permitted by cpplib, because in general it might involve issues like +restoring popped contexts onto the context stack, which are too hard. +Instead, searching for the parenthesis is handled by a special +function, @code{funlike_invocation_p}, which remembers padding +information as it reads tokens. If the next real token is not an +opening parenthesis, it backs up that one token, and then pushes an +extra context just containing the padding information if necessary. + +@section Marking tokens ineligible for future expansion +As discussed above, cpplib needs a way of marking tokens as +unexpandable. Since the tokens cpplib handles are read-only once they +have been lexed, it instead makes a copy of the token and adds the +flag @code{NO_EXPAND} to the copy. + +For efficiency and to simplify memory management by avoiding having to +remember to free these tokens, they are allocated as temporary tokens +from the lexer's current token run (@pxref{Lexing a line}) using the +function @code{_cpp_temp_token}. The tokens are then re-used once the +current line of tokens has been read in. + +This might sound unsafe. However, tokens runs are not re-used at the +end of a line if it happens to be in the middle of a macro argument +list, and cpplib only wants to back-up more than one lexer token in +situations where no macro expansion is involved, so the optimization +is safe. + +@node Token Spacing +@unnumbered Token Spacing +@cindex paste avoidance +@cindex spacing +@cindex token spacing + +First, consider an issue that only concerns the stand-alone +preprocessor: there needs to be a guarantee that re-reading its preprocessed +output results in an identical token stream. Without taking special +measures, this might not be the case because of macro substitution. +For example: + +@smallexample +#define PLUS + +#define EMPTY +#define f(x) =x= ++PLUS -EMPTY- PLUS+ f(=) + @expansion{} + + - - + + = = = +@emph{not} + @expansion{} ++ -- ++ === +@end smallexample + +One solution would be to simply insert a space between all adjacent +tokens. However, we would like to keep space insertion to a minimum, +both for aesthetic reasons and because it causes problems for people who +still try to abuse the preprocessor for things like Fortran source and +Makefiles. + +For now, just notice that when tokens are added (or removed, as shown by +the @code{EMPTY} example) from the original lexed token stream, we need +to check for accidental token pasting. We call this @dfn{paste +avoidance}. Token addition and removal can only occur because of macro +expansion, but accidental pasting can occur in many places: both before +and after each macro replacement, each argument replacement, and +additionally each token created by the @samp{#} and @samp{##} operators. + +Look at how the preprocessor gets whitespace output correct +normally. The @code{cpp_token} structure contains a flags byte, and one +of those flags is @code{PREV_WHITE}. This is flagged by the lexer, and +indicates that the token was preceded by whitespace of some form other +than a new line. The stand-alone preprocessor can use this flag to +decide whether to insert a space between tokens in the output. + +Now consider the result of the following macro expansion: + +@smallexample +#define add(x, y, z) x + y +z; +sum = add (1,2, 3); + @expansion{} sum = 1 + 2 +3; +@end smallexample + +The interesting thing here is that the tokens @samp{1} and @samp{2} are +output with a preceding space, and @samp{3} is output without a +preceding space, but when lexed none of these tokens had that property. +Careful consideration reveals that @samp{1} gets its preceding +whitespace from the space preceding @samp{add} in the macro invocation, +@emph{not} replacement list. @samp{2} gets its whitespace from the +space preceding the parameter @samp{y} in the macro replacement list, +and @samp{3} has no preceding space because parameter @samp{z} has none +in the replacement list. + +Once lexed, tokens are effectively fixed and cannot be altered, since +pointers to them might be held in many places, in particular by +in-progress macro expansions. So instead of modifying the two tokens +above, the preprocessor inserts a special token, which I call a +@dfn{padding token}, into the token stream to indicate that spacing of +the subsequent token is special. The preprocessor inserts padding +tokens in front of every macro expansion and expanded macro argument. +These point to a @dfn{source token} from which the subsequent real token +should inherit its spacing. In the above example, the source tokens are +@samp{add} in the macro invocation, and @samp{y} and @samp{z} in the +macro replacement list, respectively. + +It is quite easy to get multiple padding tokens in a row, for example if +a macro's first replacement token expands straight into another macro. + +@smallexample +#define foo bar +#define bar baz +[foo] + @expansion{} [baz] +@end smallexample + +Here, two padding tokens are generated with sources the @samp{foo} token +between the brackets, and the @samp{bar} token from foo's replacement +list, respectively. Clearly the first padding token is the one to +use, so the output code should contain a rule that the first +padding token in a sequence is the one that matters. + +But what if a macro expansion is left? Adjusting the above +example slightly: + +@smallexample +#define foo bar +#define bar EMPTY baz +#define EMPTY +[foo] EMPTY; + @expansion{} [ baz] ; +@end smallexample + +As shown, now there should be a space before @samp{baz} and the +semicolon in the output. + +The rules we decided above fail for @samp{baz}: we generate three +padding tokens, one per macro invocation, before the token @samp{baz}. +We would then have it take its spacing from the first of these, which +carries source token @samp{foo} with no leading space. + +It is vital that cpplib get spacing correct in these examples since any +of these macro expansions could be stringized, where spacing matters. + +So, this demonstrates that not just entering macro and argument +expansions, but leaving them requires special handling too. I made +cpplib insert a padding token with a @code{NULL} source token when +leaving macro expansions, as well as after each replaced argument in a +macro's replacement list. It also inserts appropriate padding tokens on +either side of tokens created by the @samp{#} and @samp{##} operators. +I expanded the rule so that, if we see a padding token with a +@code{NULL} source token, @emph{and} that source token has no leading +space, then we behave as if we have seen no padding tokens at all. A +quick check shows this rule will then get the above example correct as +well. + +Now a relationship with paste avoidance is apparent: we have to be +careful about paste avoidance in exactly the same locations we have +padding tokens in order to get white space correct. This makes +implementation of paste avoidance easy: wherever the stand-alone +preprocessor is fixing up spacing because of padding tokens, and it +turns out that no space is needed, it has to take the extra step to +check that a space is not needed after all to avoid an accidental paste. +The function @code{cpp_avoid_paste} advises whether a space is required +between two consecutive tokens. To avoid excessive spacing, it tries +hard to only require a space if one is likely to be necessary, but for +reasons of efficiency it is slightly conservative and might recommend a +space where one is not strictly needed. + +@node Line Numbering +@unnumbered Line numbering +@cindex line numbers + +@section Just which line number anyway? + +There are three reasonable requirements a cpplib client might have for +the line number of a token passed to it: + +@itemize @bullet +@item +The source line it was lexed on. +@item +The line it is output on. This can be different to the line it was +lexed on if, for example, there are intervening escaped newlines or +C-style comments. For example: + +@smallexample +foo /* @r{A long +comment} */ bar \ +baz +@result{} +foo bar baz +@end smallexample + +@item +If the token results from a macro expansion, the line of the macro name, +or possibly the line of the closing parenthesis in the case of +function-like macro expansion. +@end itemize + +The @code{cpp_token} structure contains @code{line} and @code{col} +members. The lexer fills these in with the line and column of the first +character of the token. Consequently, but maybe unexpectedly, a token +from the replacement list of a macro expansion carries the location of +the token within the @code{#define} directive, because cpplib expands a +macro by returning pointers to the tokens in its replacement list. The +current implementation of cpplib assigns tokens created from built-in +macros and the @samp{#} and @samp{##} operators the location of the most +recently lexed token. This is a because they are allocated from the +lexer's token runs, and because of the way the diagnostic routines infer +the appropriate location to report. + +The diagnostic routines in cpplib display the location of the most +recently @emph{lexed} token, unless they are passed a specific line and +column to report. For diagnostics regarding tokens that arise from +macro expansions, it might also be helpful for the user to see the +original location in the macro definition that the token came from. +Since that is exactly the information each token carries, such an +enhancement could be made relatively easily in future. + +The stand-alone preprocessor faces a similar problem when determining +the correct line to output the token on: the position attached to a +token is fairly useless if the token came from a macro expansion. All +tokens on a logical line should be output on its first physical line, so +the token's reported location is also wrong if it is part of a physical +line other than the first. + +To solve these issues, cpplib provides a callback that is generated +whenever it lexes a preprocessing token that starts a new logical line +other than a directive. It passes this token (which may be a +@code{CPP_EOF} token indicating the end of the translation unit) to the +callback routine, which can then use the line and column of this token +to produce correct output. + +@section Representation of line numbers + +As mentioned above, cpplib stores with each token the line number that +it was lexed on. In fact, this number is not the number of the line in +the source file, but instead bears more resemblance to the number of the +line in the translation unit. + +The preprocessor maintains a monotonic increasing line count, which is +incremented at every new line character (and also at the end of any +buffer that does not end in a new line). Since a line number of zero is +useful to indicate certain special states and conditions, this variable +starts counting from one. + +This variable therefore uniquely enumerates each line in the translation +unit. With some simple infrastructure, it is straight forward to map +from this to the original source file and line number pair, saving space +whenever line number information needs to be saved. The code the +implements this mapping lies in the files @file{line-map.cc} and +@file{line-map.h}. + +Command-line macros and assertions are implemented by pushing a buffer +containing the right hand side of an equivalent @code{#define} or +@code{#assert} directive. Some built-in macros are handled similarly. +Since these are all processed before the first line of the main input +file, it will typically have an assigned line closer to twenty than to +one. + +@node Guard Macros +@unnumbered The Multiple-Include Optimization +@cindex guard macros +@cindex controlling macros +@cindex multiple-include optimization + +Header files are often of the form + +@smallexample +#ifndef FOO +#define FOO +@dots{} +#endif +@end smallexample + +@noindent +to prevent the compiler from processing them more than once. The +preprocessor notices such header files, so that if the header file +appears in a subsequent @code{#include} directive and @code{FOO} is +defined, then it is ignored and it doesn't preprocess or even re-open +the file a second time. This is referred to as the @dfn{multiple +include optimization}. + +Under what circumstances is such an optimization valid? If the file +were included a second time, it can only be optimized away if that +inclusion would result in no tokens to return, and no relevant +directives to process. Therefore the current implementation imposes +requirements and makes some allowances as follows: + +@enumerate +@item +There must be no tokens outside the controlling @code{#if}-@code{#endif} +pair, but whitespace and comments are permitted. + +@item +There must be no directives outside the controlling directive pair, but +the @dfn{null directive} (a line containing nothing other than a single +@samp{#} and possibly whitespace) is permitted. + +@item +The opening directive must be of the form + +@smallexample +#ifndef FOO +@end smallexample + +or + +@smallexample +#if !defined FOO [equivalently, #if !defined(FOO)] +@end smallexample + +@item +In the second form above, the tokens forming the @code{#if} expression +must have come directly from the source file---no macro expansion must +have been involved. This is because macro definitions can change, and +tracking whether or not a relevant change has been made is not worth the +implementation cost. + +@item +There can be no @code{#else} or @code{#elif} directives at the outer +conditional block level, because they would probably contain something +of interest to a subsequent pass. +@end enumerate + +First, when pushing a new file on the buffer stack, +@code{_stack_include_file} sets the controlling macro @code{mi_cmacro} to +@code{NULL}, and sets @code{mi_valid} to @code{true}. This indicates +that the preprocessor has not yet encountered anything that would +invalidate the multiple-include optimization. As described in the next +few paragraphs, these two variables having these values effectively +indicates top-of-file. + +When about to return a token that is not part of a directive, +@code{_cpp_lex_token} sets @code{mi_valid} to @code{false}. This +enforces the constraint that tokens outside the controlling conditional +block invalidate the optimization. + +The @code{do_if}, when appropriate, and @code{do_ifndef} directive +handlers pass the controlling macro to the function +@code{push_conditional}. cpplib maintains a stack of nested conditional +blocks, and after processing every opening conditional this function +pushes an @code{if_stack} structure onto the stack. In this structure +it records the controlling macro for the block, provided there is one +and we're at top-of-file (as described above). If an @code{#elif} or +@code{#else} directive is encountered, the controlling macro for that +block is cleared to @code{NULL}. Otherwise, it survives until the +@code{#endif} closing the block, upon which @code{do_endif} sets +@code{mi_valid} to true and stores the controlling macro in +@code{mi_cmacro}. + +@code{_cpp_handle_directive} clears @code{mi_valid} when processing any +directive other than an opening conditional and the null directive. +With this, and requiring top-of-file to record a controlling macro, and +no @code{#else} or @code{#elif} for it to survive and be copied to +@code{mi_cmacro} by @code{do_endif}, we have enforced the absence of +directives outside the main conditional block for the optimization to be +on. + +Note that whilst we are inside the conditional block, @code{mi_valid} is +likely to be reset to @code{false}, but this does not matter since +the closing @code{#endif} restores it to @code{true} if appropriate. + +Finally, since @code{_cpp_lex_direct} pops the file off the buffer stack +at @code{EOF} without returning a token, if the @code{#endif} directive +was not followed by any tokens, @code{mi_valid} is @code{true} and +@code{_cpp_pop_file_buffer} remembers the controlling macro associated +with the file. Subsequent calls to @code{stack_include_file} result in +no buffer being pushed if the controlling macro is defined, effecting +the optimization. + +A quick word on how we handle the + +@smallexample +#if !defined FOO +@end smallexample + +@noindent +case. @code{_cpp_parse_expr} and @code{parse_defined} take steps to see +whether the three stages @samp{!}, @samp{defined-expression} and +@samp{end-of-directive} occur in order in a @code{#if} expression. If +so, they return the guard macro to @code{do_if} in the variable +@code{mi_ind_cmacro}, and otherwise set it to @code{NULL}. +@code{enter_macro_context} sets @code{mi_valid} to false, so if a macro +was expanded whilst parsing any part of the expression, then the +top-of-file test in @code{push_conditional} fails and the optimization +is turned off. + +@node Files +@unnumbered File Handling +@cindex files + +Fairly obviously, the file handling code of cpplib resides in the file +@file{files.cc}. It takes care of the details of file searching, +opening, reading and caching, for both the main source file and all the +headers it recursively includes. + +The basic strategy is to minimize the number of system calls. On many +systems, the basic @code{open ()} and @code{fstat ()} system calls can +be quite expensive. For every @code{#include}-d file, we need to try +all the directories in the search path until we find a match. Some +projects, such as glibc, pass twenty or thirty include paths on the +command line, so this can rapidly become time consuming. + +For a header file we have not encountered before we have little choice +but to do this. However, it is often the case that the same headers are +repeatedly included, and in these cases we try to avoid repeating the +filesystem queries whilst searching for the correct file. + +For each file we try to open, we store the constructed path in a splay +tree. This path first undergoes simplification by the function +@code{_cpp_simplify_pathname}. For example, +@file{/usr/include/bits/../foo.h} is simplified to +@file{/usr/include/foo.h} before we enter it in the splay tree and try +to @code{open ()} the file. CPP will then find subsequent uses of +@file{foo.h}, even as @file{/usr/include/foo.h}, in the splay tree and +save system calls. + +Further, it is likely the file contents have also been cached, saving a +@code{read ()} system call. We don't bother caching the contents of +header files that are re-inclusion protected, and whose re-inclusion +macro is defined when we leave the header file for the first time. If +the host supports it, we try to map suitably large files into memory, +rather than reading them in directly. + +The include paths are internally stored on a null-terminated +singly-linked list, starting with the @code{"header.h"} directory search +chain, which then links into the @code{} directory chain. + +Files included with the @code{} syntax start the lookup directly +in the second half of this chain. However, files included with the +@code{"foo.h"} syntax start at the beginning of the chain, but with one +extra directory prepended. This is the directory of the current file; +the one containing the @code{#include} directive. Prepending this +directory on a per-file basis is handled by the function +@code{search_from}. + +Note that a header included with a directory component, such as +@code{#include "mydir/foo.h"} and opened as +@file{/usr/local/include/mydir/foo.h}, will have the complete path minus +the basename @samp{foo.h} as the current directory. + +Enough information is stored in the splay tree that CPP can immediately +tell whether it can skip the header file because of the multiple include +optimization, whether the file didn't exist or couldn't be opened for +some reason, or whether the header was flagged not to be re-used, as it +is with the obsolete @code{#import} directive. + +For the benefit of MS-DOS filesystems with an 8.3 filename limitation, +CPP offers the ability to treat various include file names as aliases +for the real header files with shorter names. The map from one to the +other is found in a special file called @samp{header.gcc}, stored in the +command line (or system) include directories to which the mapping +applies. This may be higher up the directory tree than the full path to +the file minus the base name. + +@node Concept Index +@unnumbered Concept Index +@printindex cp + +@bye diff --git a/gcc/doc/cppopts.texi b/gcc/doc/cppopts.texi new file mode 100644 index 00000000000..c0a92b37018 --- /dev/null +++ b/gcc/doc/cppopts.texi @@ -0,0 +1,557 @@ +@c Copyright (C) 1999-2022 Free Software Foundation, Inc. +@c This is part of the CPP and GCC manuals. +@c For copying conditions, see the file gcc.texi. + +@c --------------------------------------------------------------------- +@c Options affecting the preprocessor +@c --------------------------------------------------------------------- + +@c If this file is included with the flag ``cppmanual'' set, it is +@c formatted for inclusion in the CPP manual; otherwise the main GCC manual. + +@item -D @var{name} +@opindex D +Predefine @var{name} as a macro, with definition @code{1}. + +@item -D @var{name}=@var{definition} +The contents of @var{definition} are tokenized and processed as if +they appeared during translation phase three in a @samp{#define} +directive. In particular, the definition is truncated by +embedded newline characters. + +If you are invoking the preprocessor from a shell or shell-like +program you may need to use the shell's quoting syntax to protect +characters such as spaces that have a meaning in the shell syntax. + +If you wish to define a function-like macro on the command line, write +its argument list with surrounding parentheses before the equals sign +(if any). Parentheses are meaningful to most shells, so you should +quote the option. With @command{sh} and @command{csh}, +@option{-D'@var{name}(@var{args@dots{}})=@var{definition}'} works. + +@option{-D} and @option{-U} options are processed in the order they +are given on the command line. All @option{-imacros @var{file}} and +@option{-include @var{file}} options are processed after all +@option{-D} and @option{-U} options. + +@item -U @var{name} +@opindex U +Cancel any previous definition of @var{name}, either built in or +provided with a @option{-D} option. + +@item -include @var{file} +@opindex include +Process @var{file} as if @code{#include "file"} appeared as the first +line of the primary source file. However, the first directory searched +for @var{file} is the preprocessor's working directory @emph{instead of} +the directory containing the main source file. If not found there, it +is searched for in the remainder of the @code{#include "@dots{}"} search +chain as normal. + +If multiple @option{-include} options are given, the files are included +in the order they appear on the command line. + +@item -imacros @var{file} +@opindex imacros +Exactly like @option{-include}, except that any output produced by +scanning @var{file} is thrown away. Macros it defines remain defined. +This allows you to acquire all the macros from a header without also +processing its declarations. + +All files specified by @option{-imacros} are processed before all files +specified by @option{-include}. + +@item -undef +@opindex undef +Do not predefine any system-specific or GCC-specific macros. The +standard predefined macros remain defined. +@ifset cppmanual +@xref{Standard Predefined Macros}. +@end ifset + +@item -pthread +@opindex pthread +Define additional macros required for using the POSIX threads library. +You should use this option consistently for both compilation and linking. +This option is supported on GNU/Linux targets, most other Unix derivatives, +and also on x86 Cygwin and MinGW targets. + +@item -M +@opindex M +@cindex @command{make} +@cindex dependencies, @command{make} +Instead of outputting the result of preprocessing, output a rule +suitable for @command{make} describing the dependencies of the main +source file. The preprocessor outputs one @command{make} rule containing +the object file name for that source file, a colon, and the names of all +the included files, including those coming from @option{-include} or +@option{-imacros} command-line options. + +Unless specified explicitly (with @option{-MT} or @option{-MQ}), the +object file name consists of the name of the source file with any +suffix replaced with object file suffix and with any leading directory +parts removed. If there are many included files then the rule is +split into several lines using @samp{\}-newline. The rule has no +commands. + +This option does not suppress the preprocessor's debug output, such as +@option{-dM}. To avoid mixing such debug output with the dependency +rules you should explicitly specify the dependency output file with +@option{-MF}, or use an environment variable like +@env{DEPENDENCIES_OUTPUT} (@pxref{Environment Variables}). Debug output +is still sent to the regular output stream as normal. + +Passing @option{-M} to the driver implies @option{-E}, and suppresses +warnings with an implicit @option{-w}. + +@item -MM +@opindex MM +Like @option{-M} but do not mention header files that are found in +system header directories, nor header files that are included, +directly or indirectly, from such a header. + +This implies that the choice of angle brackets or double quotes in an +@samp{#include} directive does not in itself determine whether that +header appears in @option{-MM} dependency output. + +@anchor{dashMF} +@item -MF @var{file} +@opindex MF +When used with @option{-M} or @option{-MM}, specifies a +file to write the dependencies to. If no @option{-MF} switch is given +the preprocessor sends the rules to the same place it would send +preprocessed output. + +When used with the driver options @option{-MD} or @option{-MMD}, +@option{-MF} overrides the default dependency output file. + +If @var{file} is @file{-}, then the dependencies are written to @file{stdout}. + +@item -MG +@opindex MG +In conjunction with an option such as @option{-M} requesting +dependency generation, @option{-MG} assumes missing header files are +generated files and adds them to the dependency list without raising +an error. The dependency filename is taken directly from the +@code{#include} directive without prepending any path. @option{-MG} +also suppresses preprocessed output, as a missing header file renders +this useless. + +This feature is used in automatic updating of makefiles. + +@item -Mno-modules +@opindex Mno-modules +Disable dependency generation for compiled module interfaces. + +@item -MP +@opindex MP +This option instructs CPP to add a phony target for each dependency +other than the main file, causing each to depend on nothing. These +dummy rules work around errors @command{make} gives if you remove header +files without updating the @file{Makefile} to match. + +This is typical output: + +@smallexample +test.o: test.c test.h + +test.h: +@end smallexample + +@item -MT @var{target} +@opindex MT + +Change the target of the rule emitted by dependency generation. By +default CPP takes the name of the main input file, deletes any +directory components and any file suffix such as @samp{.c}, and +appends the platform's usual object suffix. The result is the target. + +An @option{-MT} option sets the target to be exactly the string you +specify. If you want multiple targets, you can specify them as a single +argument to @option{-MT}, or use multiple @option{-MT} options. + +For example, @option{@w{-MT '$(objpfx)foo.o'}} might give + +@smallexample +$(objpfx)foo.o: foo.c +@end smallexample + +@item -MQ @var{target} +@opindex MQ + +Same as @option{-MT}, but it quotes any characters which are special to +Make. @option{@w{-MQ '$(objpfx)foo.o'}} gives + +@smallexample +$$(objpfx)foo.o: foo.c +@end smallexample + +The default target is automatically quoted, as if it were given with +@option{-MQ}. + +@item -MD +@opindex MD +@option{-MD} is equivalent to @option{-M -MF @var{file}}, except that +@option{-E} is not implied. The driver determines @var{file} based on +whether an @option{-o} option is given. If it is, the driver uses its +argument but with a suffix of @file{.d}, otherwise it takes the name +of the input file, removes any directory components and suffix, and +applies a @file{.d} suffix. + +If @option{-MD} is used in conjunction with @option{-E}, any +@option{-o} switch is understood to specify the dependency output file +(@pxref{dashMF,,-MF}), but if used without @option{-E}, each @option{-o} +is understood to specify a target object file. + +Since @option{-E} is not implied, @option{-MD} can be used to generate +a dependency output file as a side effect of the compilation process. + +@item -MMD +@opindex MMD +Like @option{-MD} except mention only user header files, not system +header files. + +@item -fpreprocessed +@opindex fpreprocessed +Indicate to the preprocessor that the input file has already been +preprocessed. This suppresses things like macro expansion, trigraph +conversion, escaped newline splicing, and processing of most directives. +The preprocessor still recognizes and removes comments, so that you can +pass a file preprocessed with @option{-C} to the compiler without +problems. In this mode the integrated preprocessor is little more than +a tokenizer for the front ends. + +@option{-fpreprocessed} is implicit if the input file has one of the +extensions @samp{.i}, @samp{.ii} or @samp{.mi}. These are the +extensions that GCC uses for preprocessed files created by +@option{-save-temps}. + +@item -fdirectives-only +@opindex fdirectives-only +When preprocessing, handle directives, but do not expand macros. + +The option's behavior depends on the @option{-E} and @option{-fpreprocessed} +options. + +With @option{-E}, preprocessing is limited to the handling of directives +such as @code{#define}, @code{#ifdef}, and @code{#error}. Other +preprocessor operations, such as macro expansion and trigraph +conversion are not performed. In addition, the @option{-dD} option is +implicitly enabled. + +With @option{-fpreprocessed}, predefinition of command line and most +builtin macros is disabled. Macros such as @code{__LINE__}, which are +contextually dependent, are handled normally. This enables compilation of +files previously preprocessed with @code{-E -fdirectives-only}. + +With both @option{-E} and @option{-fpreprocessed}, the rules for +@option{-fpreprocessed} take precedence. This enables full preprocessing of +files previously preprocessed with @code{-E -fdirectives-only}. + +@item -fdollars-in-identifiers +@opindex fdollars-in-identifiers +@anchor{fdollars-in-identifiers} +Accept @samp{$} in identifiers. +@ifset cppmanual +@xref{Identifier characters}. +@end ifset + +@item -fextended-identifiers +@opindex fextended-identifiers +Accept universal character names and extended characters in +identifiers. This option is enabled by default for C99 (and later C +standard versions) and C++. + +@item -fno-canonical-system-headers +@opindex fno-canonical-system-headers +When preprocessing, do not shorten system header paths with canonicalization. + +@item -fmax-include-depth=@var{depth} +@opindex fmax-include-depth +Set the maximum depth of the nested #include. The default is 200. + +@item -ftabstop=@var{width} +@opindex ftabstop +Set the distance between tab stops. This helps the preprocessor report +correct column numbers in warnings or errors, even if tabs appear on the +line. If the value is less than 1 or greater than 100, the option is +ignored. The default is 8. + +@item -ftrack-macro-expansion@r{[}=@var{level}@r{]} +@opindex ftrack-macro-expansion +Track locations of tokens across macro expansions. This allows the +compiler to emit diagnostic about the current macro expansion stack +when a compilation error occurs in a macro expansion. Using this +option makes the preprocessor and the compiler consume more +memory. The @var{level} parameter can be used to choose the level of +precision of token location tracking thus decreasing the memory +consumption if necessary. Value @samp{0} of @var{level} de-activates +this option. Value @samp{1} tracks tokens locations in a +degraded mode for the sake of minimal memory overhead. In this mode +all tokens resulting from the expansion of an argument of a +function-like macro have the same location. Value @samp{2} tracks +tokens locations completely. This value is the most memory hungry. +When this option is given no argument, the default parameter value is +@samp{2}. + +Note that @code{-ftrack-macro-expansion=2} is activated by default. + +@item -fmacro-prefix-map=@var{old}=@var{new} +@opindex fmacro-prefix-map +When preprocessing files residing in directory @file{@var{old}}, +expand the @code{__FILE__} and @code{__BASE_FILE__} macros as if the +files resided in directory @file{@var{new}} instead. This can be used +to change an absolute path to a relative path by using @file{.} for +@var{new} which can result in more reproducible builds that are +location independent. This option also affects +@code{__builtin_FILE()} during compilation. See also +@option{-ffile-prefix-map}. + +@item -fexec-charset=@var{charset} +@opindex fexec-charset +@cindex character set, execution +Set the execution character set, used for string and character +constants. The default is UTF-8. @var{charset} can be any encoding +supported by the system's @code{iconv} library routine. + +@item -fwide-exec-charset=@var{charset} +@opindex fwide-exec-charset +@cindex character set, wide execution +Set the wide execution character set, used for wide string and +character constants. The default is one of UTF-32BE, UTF-32LE, UTF-16BE, +or UTF-16LE, whichever corresponds to the width of @code{wchar_t} and the +big-endian or little-endian byte order being used for code generation. As +with @option{-fexec-charset}, @var{charset} can be any encoding supported +by the system's @code{iconv} library routine; however, you will have +problems with encodings that do not fit exactly in @code{wchar_t}. + +@item -finput-charset=@var{charset} +@opindex finput-charset +@cindex character set, input +Set the input character set, used for translation from the character +set of the input file to the source character set used by GCC@. If the +locale does not specify, or GCC cannot get this information from the +locale, the default is UTF-8. This can be overridden by either the locale +or this command-line option. Currently the command-line option takes +precedence if there's a conflict. @var{charset} can be any encoding +supported by the system's @code{iconv} library routine. + +@ifclear cppmanual +@item -fpch-deps +@opindex fpch-deps +When using precompiled headers (@pxref{Precompiled Headers}), this flag +causes the dependency-output flags to also list the files from the +precompiled header's dependencies. If not specified, only the +precompiled header are listed and not the files that were used to +create it, because those files are not consulted when a precompiled +header is used. + +@item -fpch-preprocess +@opindex fpch-preprocess +This option allows use of a precompiled header (@pxref{Precompiled +Headers}) together with @option{-E}. It inserts a special @code{#pragma}, +@code{#pragma GCC pch_preprocess "@var{filename}"} in the output to mark +the place where the precompiled header was found, and its @var{filename}. +When @option{-fpreprocessed} is in use, GCC recognizes this @code{#pragma} +and loads the PCH@. + +This option is off by default, because the resulting preprocessed output +is only really suitable as input to GCC@. It is switched on by +@option{-save-temps}. + +You should not write this @code{#pragma} in your own code, but it is +safe to edit the filename if the PCH file is available in a different +location. The filename may be absolute or it may be relative to GCC's +current directory. +@end ifclear + +@item -fworking-directory +@opindex fworking-directory +@opindex fno-working-directory +Enable generation of linemarkers in the preprocessor output that +let the compiler know the current working directory at the time of +preprocessing. When this option is enabled, the preprocessor +emits, after the initial linemarker, a second linemarker with the +current working directory followed by two slashes. GCC uses this +directory, when it's present in the preprocessed input, as the +directory emitted as the current working directory in some debugging +information formats. This option is implicitly enabled if debugging +information is enabled, but this can be inhibited with the negated +form @option{-fno-working-directory}. If the @option{-P} flag is +present in the command line, this option has no effect, since no +@code{#line} directives are emitted whatsoever. + +@item -A @var{predicate}=@var{answer} +@opindex A +Make an assertion with the predicate @var{predicate} and answer +@var{answer}. This form is preferred to the older form @option{-A +@var{predicate}(@var{answer})}, which is still supported, because +it does not use shell special characters. +@ifset cppmanual +@xref{Obsolete Features}. +@end ifset + +@item -A -@var{predicate}=@var{answer} +Cancel an assertion with the predicate @var{predicate} and answer +@var{answer}. + +@item -C +@opindex C +Do not discard comments. All comments are passed through to the output +file, except for comments in processed directives, which are deleted +along with the directive. + +You should be prepared for side effects when using @option{-C}; it +causes the preprocessor to treat comments as tokens in their own right. +For example, comments appearing at the start of what would be a +directive line have the effect of turning that line into an ordinary +source line, since the first token on the line is no longer a @samp{#}. + +@item -CC +@opindex CC +Do not discard comments, including during macro expansion. This is +like @option{-C}, except that comments contained within macros are +also passed through to the output file where the macro is expanded. + +In addition to the side effects of the @option{-C} option, the +@option{-CC} option causes all C++-style comments inside a macro +to be converted to C-style comments. This is to prevent later use +of that macro from inadvertently commenting out the remainder of +the source line. + +The @option{-CC} option is generally used to support lint comments. + +@item -P +@opindex P +Inhibit generation of linemarkers in the output from the preprocessor. +This might be useful when running the preprocessor on something that is +not C code, and will be sent to a program which might be confused by the +linemarkers. +@ifset cppmanual +@xref{Preprocessor Output}. +@end ifset + +@cindex traditional C language +@cindex C language, traditional +@item -traditional +@itemx -traditional-cpp +@opindex traditional-cpp +@opindex traditional + +Try to imitate the behavior of pre-standard C preprocessors, as +opposed to ISO C preprocessors. +@ifset cppmanual +@xref{Traditional Mode}. +@end ifset +@ifclear cppmanual +See the GNU CPP manual for details. +@end ifclear + +Note that GCC does not otherwise attempt to emulate a pre-standard +C compiler, and these options are only supported with the @option{-E} +switch, or when invoking CPP explicitly. + +@item -trigraphs +@opindex trigraphs +Support ISO C trigraphs. +These are three-character sequences, all starting with @samp{??}, that +are defined by ISO C to stand for single characters. For example, +@samp{??/} stands for @samp{\}, so @samp{'??/n'} is a character +constant for a newline. +@ifset cppmanual +@xref{Initial processing}. +@end ifset + +@ifclear cppmanual +The nine trigraphs and their replacements are + +@smallexample +Trigraph: ??( ??) ??< ??> ??= ??/ ??' ??! ??- +Replacement: [ ] @{ @} # \ ^ | ~ +@end smallexample +@end ifclear + +By default, GCC ignores trigraphs, but in +standard-conforming modes it converts them. See the @option{-std} and +@option{-ansi} options. + +@item -remap +@opindex remap +Enable special code to work around file systems which only permit very +short file names, such as MS-DOS@. + +@item -H +@opindex H +Print the name of each header file used, in addition to other normal +activities. Each name is indented to show how deep in the +@samp{#include} stack it is. Precompiled header files are also +printed, even if they are found to be invalid; an invalid precompiled +header file is printed with @samp{...x} and a valid one with @samp{...!} . + +@item -d@var{letters} +@opindex d +Says to make debugging dumps during compilation as specified by +@var{letters}. The flags documented here are those relevant to the +preprocessor. Other @var{letters} are interpreted +by the compiler proper, or reserved for future versions of GCC, and so +are silently ignored. If you specify @var{letters} whose behavior +conflicts, the result is undefined. +@ifclear cppmanual +@xref{Developer Options}, for more information. +@end ifclear + +@table @gcctabopt +@item -dM +@opindex dM +Instead of the normal output, generate a list of @samp{#define} +directives for all the macros defined during the execution of the +preprocessor, including predefined macros. This gives you a way of +finding out what is predefined in your version of the preprocessor. +Assuming you have no file @file{foo.h}, the command + +@smallexample +touch foo.h; cpp -dM foo.h +@end smallexample + +@noindent +shows all the predefined macros. + +@ifclear cppmanual +If you use @option{-dM} without the @option{-E} option, @option{-dM} is +interpreted as a synonym for @option{-fdump-rtl-mach}. +@xref{Developer Options, , ,gcc}. +@end ifclear + +@item -dD +@opindex dD +Like @option{-dM} except in two respects: it does @emph{not} include the +predefined macros, and it outputs @emph{both} the @samp{#define} +directives and the result of preprocessing. Both kinds of output go to +the standard output file. + +@item -dN +@opindex dN +Like @option{-dD}, but emit only the macro names, not their expansions. + +@item -dI +@opindex dI +Output @samp{#include} directives in addition to the result of +preprocessing. + +@item -dU +@opindex dU +Like @option{-dD} except that only macros that are expanded, or whose +definedness is tested in preprocessor directives, are output; the +output is delayed until the use or test of the macro; and +@samp{#undef} directives are also output for macros tested but +undefined at the time. +@end table + +@item -fdebug-cpp +@opindex fdebug-cpp +This option is only useful for debugging GCC. When used from CPP or with +@option{-E}, it dumps debugging information about location maps. Every +token in the output is preceded by the dump of the map its location +belongs to. + +When used from GCC without @option{-E}, this option has no effect. diff --git a/gcc/doc/cppwarnopts.texi b/gcc/doc/cppwarnopts.texi new file mode 100644 index 00000000000..fa048249369 --- /dev/null +++ b/gcc/doc/cppwarnopts.texi @@ -0,0 +1,82 @@ +@c Copyright (C) 1999-2022 Free Software Foundation, Inc. +@c This is part of the CPP and GCC manuals. +@c For copying conditions, see the file gcc.texi. + +@c --------------------------------------------------------------------- +@c Options affecting preprocessor warnings +@c --------------------------------------------------------------------- + +@c If this file is included with the flag ``cppmanual'' set, it is +@c formatted for inclusion in the CPP manual; otherwise the main GCC manual. + +@item -Wcomment +@itemx -Wcomments +@opindex Wcomment +@opindex Wcomments +Warn whenever a comment-start sequence @samp{/*} appears in a @samp{/*} +comment, or whenever a backslash-newline appears in a @samp{//} comment. +This warning is enabled by @option{-Wall}. + +@item -Wtrigraphs +@opindex Wtrigraphs +@anchor{Wtrigraphs} +Warn if any trigraphs are encountered that might change the meaning of +the program. Trigraphs within comments are not warned about, +except those that would form escaped newlines. + +This option is implied by @option{-Wall}. If @option{-Wall} is not +given, this option is still enabled unless trigraphs are enabled. To +get trigraph conversion without warnings, but get the other +@option{-Wall} warnings, use @samp{-trigraphs -Wall -Wno-trigraphs}. + +@item -Wundef +@opindex Wundef +@opindex Wno-undef +Warn if an undefined identifier is evaluated in an @code{#if} directive. +Such identifiers are replaced with zero. + +@item -Wexpansion-to-defined +@opindex Wexpansion-to-defined +Warn whenever @samp{defined} is encountered in the expansion of a macro +(including the case where the macro is expanded by an @samp{#if} directive). +Such usage is not portable. +This warning is also enabled by @option{-Wpedantic} and @option{-Wextra}. + +@item -Wunused-macros +@opindex Wunused-macros +Warn about macros defined in the main file that are unused. A macro +is @dfn{used} if it is expanded or tested for existence at least once. +The preprocessor also warns if the macro has not been used at the +time it is redefined or undefined. + +Built-in macros, macros defined on the command line, and macros +defined in include files are not warned about. + +@emph{Note:} If a macro is actually used, but only used in skipped +conditional blocks, then the preprocessor reports it as unused. To avoid the +warning in such a case, you might improve the scope of the macro's +definition by, for example, moving it into the first skipped block. +Alternatively, you could provide a dummy use with something like: + +@smallexample +#if defined the_macro_causing_the_warning +#endif +@end smallexample + +@item -Wno-endif-labels +@opindex Wno-endif-labels +@opindex Wendif-labels +Do not warn whenever an @code{#else} or an @code{#endif} are followed by text. +This sometimes happens in older programs with code of the form + +@smallexample +#if FOO +@dots{} +#else FOO +@dots{} +#endif FOO +@end smallexample + +@noindent +The second and third @code{FOO} should be in comments. +This warning is on by default. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi new file mode 100644 index 00000000000..8da0db9770d --- /dev/null +++ b/gcc/doc/extend.texi @@ -0,0 +1,25550 @@ +c Copyright (C) 1988-2022 Free Software Foundation, Inc. + +@c This is part of the GCC manual. +@c For copying conditions, see the file gcc.texi. + +@node C Extensions +@chapter Extensions to the C Language Family +@cindex extensions, C language +@cindex C language extensions + +@opindex pedantic +GNU C provides several language features not found in ISO standard C@. +(The @option{-pedantic} option directs GCC to print a warning message if +any of these features is used.) To test for the availability of these +features in conditional compilation, check for a predefined macro +@code{__GNUC__}, which is always defined under GCC@. + +These extensions are available in C and Objective-C@. Most of them are +also available in C++. @xref{C++ Extensions,,Extensions to the +C++ Language}, for extensions that apply @emph{only} to C++. + +Some features that are in ISO C99 but not C90 or C++ are also, as +extensions, accepted by GCC in C90 mode and in C++. + +@menu +* Statement Exprs:: Putting statements and declarations inside expressions. +* Local Labels:: Labels local to a block. +* Labels as Values:: Getting pointers to labels, and computed gotos. +* Nested Functions:: Nested function in GNU C. +* Nonlocal Gotos:: Nonlocal gotos. +* Constructing Calls:: Dispatching a call to another function. +* Typeof:: @code{typeof}: referring to the type of an expression. +* Conditionals:: Omitting the middle operand of a @samp{?:} expression. +* __int128:: 128-bit integers---@code{__int128}. +* Long Long:: Double-word integers---@code{long long int}. +* Complex:: Data types for complex numbers. +* Floating Types:: Additional Floating Types. +* Half-Precision:: Half-Precision Floating Point. +* Decimal Float:: Decimal Floating Types. +* Hex Floats:: Hexadecimal floating-point constants. +* Fixed-Point:: Fixed-Point Types. +* Named Address Spaces::Named address spaces. +* Zero Length:: Zero-length arrays. +* Empty Structures:: Structures with no members. +* Variable Length:: Arrays whose length is computed at run time. +* Variadic Macros:: Macros with a variable number of arguments. +* Escaped Newlines:: Slightly looser rules for escaped newlines. +* Subscripting:: Any array can be subscripted, even if not an lvalue. +* Pointer Arith:: Arithmetic on @code{void}-pointers and function pointers. +* Variadic Pointer Args:: Pointer arguments to variadic functions. +* Pointers to Arrays:: Pointers to arrays with qualifiers work as expected. +* Initializers:: Non-constant initializers. +* Compound Literals:: Compound literals give structures, unions + or arrays as values. +* Designated Inits:: Labeling elements of initializers. +* Case Ranges:: `case 1 ... 9' and such. +* Cast to Union:: Casting to union type from any member of the union. +* Mixed Labels and Declarations:: Mixing declarations, labels and code. +* Function Attributes:: Declaring that functions have no side effects, + or that they can never return. +* Variable Attributes:: Specifying attributes of variables. +* Type Attributes:: Specifying attributes of types. +* Label Attributes:: Specifying attributes on labels. +* Enumerator Attributes:: Specifying attributes on enumerators. +* Statement Attributes:: Specifying attributes on statements. +* Attribute Syntax:: Formal syntax for attributes. +* Function Prototypes:: Prototype declarations and old-style definitions. +* C++ Comments:: C++ comments are recognized. +* Dollar Signs:: Dollar sign is allowed in identifiers. +* Character Escapes:: @samp{\e} stands for the character @key{ESC}. +* Alignment:: Determining the alignment of a function, type or variable. +* Inline:: Defining inline functions (as fast as macros). +* Volatiles:: What constitutes an access to a volatile object. +* Using Assembly Language with C:: Instructions and extensions for interfacing C with assembler. +* Alternate Keywords:: @code{__const__}, @code{__asm__}, etc., for header files. +* Incomplete Enums:: @code{enum foo;}, with details to follow. +* Function Names:: Printable strings which are the name of the current + function. +* Return Address:: Getting the return or frame address of a function. +* Vector Extensions:: Using vector instructions through built-in functions. +* Offsetof:: Special syntax for implementing @code{offsetof}. +* __sync Builtins:: Legacy built-in functions for atomic memory access. +* __atomic Builtins:: Atomic built-in functions with memory model. +* Integer Overflow Builtins:: Built-in functions to perform arithmetics and + arithmetic overflow checking. +* x86 specific memory model extensions for transactional memory:: x86 memory models. +* Object Size Checking:: Built-in functions for limited buffer overflow + checking. +* Other Builtins:: Other built-in functions. +* Target Builtins:: Built-in functions specific to particular targets. +* Target Format Checks:: Format checks specific to particular targets. +* Pragmas:: Pragmas accepted by GCC. +* Unnamed Fields:: Unnamed struct/union fields within structs/unions. +* Thread-Local:: Per-thread variables. +* Binary constants:: Binary constants using the @samp{0b} prefix. +@end menu + +@node Statement Exprs +@section Statements and Declarations in Expressions +@cindex statements inside expressions +@cindex declarations inside expressions +@cindex expressions containing statements +@cindex macros, statements in expressions + +@c the above section title wrapped and causes an underfull hbox.. i +@c changed it from "within" to "in". --mew 4feb93 +A compound statement enclosed in parentheses may appear as an expression +in GNU C@. This allows you to use loops, switches, and local variables +within an expression. + +Recall that a compound statement is a sequence of statements surrounded +by braces; in this construct, parentheses go around the braces. For +example: + +@smallexample +(@{ int y = foo (); int z; + if (y > 0) z = y; + else z = - y; + z; @}) +@end smallexample + +@noindent +is a valid (though slightly more complex than necessary) expression +for the absolute value of @code{foo ()}. + +The last thing in the compound statement should be an expression +followed by a semicolon; the value of this subexpression serves as the +value of the entire construct. (If you use some other kind of statement +last within the braces, the construct has type @code{void}, and thus +effectively no value.) + +This feature is especially useful in making macro definitions ``safe'' (so +that they evaluate each operand exactly once). For example, the +``maximum'' function is commonly defined as a macro in standard C as +follows: + +@smallexample +#define max(a,b) ((a) > (b) ? (a) : (b)) +@end smallexample + +@noindent +@cindex side effects, macro argument +But this definition computes either @var{a} or @var{b} twice, with bad +results if the operand has side effects. In GNU C, if you know the +type of the operands (here taken as @code{int}), you can avoid this +problem by defining the macro as follows: + +@smallexample +#define maxint(a,b) \ + (@{int _a = (a), _b = (b); _a > _b ? _a : _b; @}) +@end smallexample + +Note that introducing variable declarations (as we do in @code{maxint}) can +cause variable shadowing, so while this example using the @code{max} macro +produces correct results: +@smallexample +int _a = 1, _b = 2, c; +c = max (_a, _b); +@end smallexample +@noindent +this example using maxint will not: +@smallexample +int _a = 1, _b = 2, c; +c = maxint (_a, _b); +@end smallexample + +This problem may for instance occur when we use this pattern recursively, like +so: + +@smallexample +#define maxint3(a, b, c) \ + (@{int _a = (a), _b = (b), _c = (c); maxint (maxint (_a, _b), _c); @}) +@end smallexample + +Embedded statements are not allowed in constant expressions, such as +the value of an enumeration constant, the width of a bit-field, or +the initial value of a static variable. + +If you don't know the type of the operand, you can still do this, but you +must use @code{typeof} or @code{__auto_type} (@pxref{Typeof}). + +In G++, the result value of a statement expression undergoes array and +function pointer decay, and is returned by value to the enclosing +expression. For instance, if @code{A} is a class, then + +@smallexample + A a; + + (@{a;@}).Foo () +@end smallexample + +@noindent +constructs a temporary @code{A} object to hold the result of the +statement expression, and that is used to invoke @code{Foo}. +Therefore the @code{this} pointer observed by @code{Foo} is not the +address of @code{a}. + +In a statement expression, any temporaries created within a statement +are destroyed at that statement's end. This makes statement +expressions inside macros slightly different from function calls. In +the latter case temporaries introduced during argument evaluation are +destroyed at the end of the statement that includes the function +call. In the statement expression case they are destroyed during +the statement expression. For instance, + +@smallexample +#define macro(a) (@{__typeof__(a) b = (a); b + 3; @}) +template T function(T a) @{ T b = a; return b + 3; @} + +void foo () +@{ + macro (X ()); + function (X ()); +@} +@end smallexample + +@noindent +has different places where temporaries are destroyed. For the +@code{macro} case, the temporary @code{X} is destroyed just after +the initialization of @code{b}. In the @code{function} case that +temporary is destroyed when the function returns. + +These considerations mean that it is probably a bad idea to use +statement expressions of this form in header files that are designed to +work with C++. (Note that some versions of the GNU C Library contained +header files using statement expressions that lead to precisely this +bug.) + +Jumping into a statement expression with @code{goto} or using a +@code{switch} statement outside the statement expression with a +@code{case} or @code{default} label inside the statement expression is +not permitted. Jumping into a statement expression with a computed +@code{goto} (@pxref{Labels as Values}) has undefined behavior. +Jumping out of a statement expression is permitted, but if the +statement expression is part of a larger expression then it is +unspecified which other subexpressions of that expression have been +evaluated except where the language definition requires certain +subexpressions to be evaluated before or after the statement +expression. A @code{break} or @code{continue} statement inside of +a statement expression used in @code{while}, @code{do} or @code{for} +loop or @code{switch} statement condition +or @code{for} statement init or increment expressions jumps to an +outer loop or @code{switch} statement if any (otherwise it is an error), +rather than to the loop or @code{switch} statement in whose condition +or init or increment expression it appears. +In any case, as with a function call, the evaluation of a +statement expression is not interleaved with the evaluation of other +parts of the containing expression. For example, + +@smallexample + foo (), ((@{ bar1 (); goto a; 0; @}) + bar2 ()), baz(); +@end smallexample + +@noindent +calls @code{foo} and @code{bar1} and does not call @code{baz} but +may or may not call @code{bar2}. If @code{bar2} is called, it is +called after @code{foo} and before @code{bar1}. + +@node Local Labels +@section Locally Declared Labels +@cindex local labels +@cindex macros, local labels + +GCC allows you to declare @dfn{local labels} in any nested block +scope. A local label is just like an ordinary label, but you can +only reference it (with a @code{goto} statement, or by taking its +address) within the block in which it is declared. + +A local label declaration looks like this: + +@smallexample +__label__ @var{label}; +@end smallexample + +@noindent +or + +@smallexample +__label__ @var{label1}, @var{label2}, /* @r{@dots{}} */; +@end smallexample + +Local label declarations must come at the beginning of the block, +before any ordinary declarations or statements. + +The label declaration defines the label @emph{name}, but does not define +the label itself. You must do this in the usual way, with +@code{@var{label}:}, within the statements of the statement expression. + +The local label feature is useful for complex macros. If a macro +contains nested loops, a @code{goto} can be useful for breaking out of +them. However, an ordinary label whose scope is the whole function +cannot be used: if the macro can be expanded several times in one +function, the label is multiply defined in that function. A +local label avoids this problem. For example: + +@smallexample +#define SEARCH(value, array, target) \ +do @{ \ + __label__ found; \ + typeof (target) _SEARCH_target = (target); \ + typeof (*(array)) *_SEARCH_array = (array); \ + int i, j; \ + int value; \ + for (i = 0; i < max; i++) \ + for (j = 0; j < max; j++) \ + if (_SEARCH_array[i][j] == _SEARCH_target) \ + @{ (value) = i; goto found; @} \ + (value) = -1; \ + found:; \ +@} while (0) +@end smallexample + +This could also be written using a statement expression: + +@smallexample +#define SEARCH(array, target) \ +(@{ \ + __label__ found; \ + typeof (target) _SEARCH_target = (target); \ + typeof (*(array)) *_SEARCH_array = (array); \ + int i, j; \ + int value; \ + for (i = 0; i < max; i++) \ + for (j = 0; j < max; j++) \ + if (_SEARCH_array[i][j] == _SEARCH_target) \ + @{ value = i; goto found; @} \ + value = -1; \ + found: \ + value; \ +@}) +@end smallexample + +Local label declarations also make the labels they declare visible to +nested functions, if there are any. @xref{Nested Functions}, for details. + +@node Labels as Values +@section Labels as Values +@cindex labels as values +@cindex computed gotos +@cindex goto with computed label +@cindex address of a label + +You can get the address of a label defined in the current function +(or a containing function) with the unary operator @samp{&&}. The +value has type @code{void *}. This value is a constant and can be used +wherever a constant of that type is valid. For example: + +@smallexample +void *ptr; +/* @r{@dots{}} */ +ptr = &&foo; +@end smallexample + +To use these values, you need to be able to jump to one. This is done +with the computed goto statement@footnote{The analogous feature in +Fortran is called an assigned goto, but that name seems inappropriate in +C, where one can do more than simply store label addresses in label +variables.}, @code{goto *@var{exp};}. For example, + +@smallexample +goto *ptr; +@end smallexample + +@noindent +Any expression of type @code{void *} is allowed. + +One way of using these constants is in initializing a static array that +serves as a jump table: + +@smallexample +static void *array[] = @{ &&foo, &&bar, &&hack @}; +@end smallexample + +@noindent +Then you can select a label with indexing, like this: + +@smallexample +goto *array[i]; +@end smallexample + +@noindent +Note that this does not check whether the subscript is in bounds---array +indexing in C never does that. + +Such an array of label values serves a purpose much like that of the +@code{switch} statement. The @code{switch} statement is cleaner, so +use that rather than an array unless the problem does not fit a +@code{switch} statement very well. + +Another use of label values is in an interpreter for threaded code. +The labels within the interpreter function can be stored in the +threaded code for super-fast dispatching. + +You may not use this mechanism to jump to code in a different function. +If you do that, totally unpredictable things happen. The best way to +avoid this is to store the label address only in automatic variables and +never pass it as an argument. + +An alternate way to write the above example is + +@smallexample +static const int array[] = @{ &&foo - &&foo, &&bar - &&foo, + &&hack - &&foo @}; +goto *(&&foo + array[i]); +@end smallexample + +@noindent +This is more friendly to code living in shared libraries, as it reduces +the number of dynamic relocations that are needed, and by consequence, +allows the data to be read-only. +This alternative with label differences is not supported for the AVR target, +please use the first approach for AVR programs. + +The @code{&&foo} expressions for the same label might have different +values if the containing function is inlined or cloned. If a program +relies on them being always the same, +@code{__attribute__((__noinline__,__noclone__))} should be used to +prevent inlining and cloning. If @code{&&foo} is used in a static +variable initializer, inlining and cloning is forbidden. + +@node Nested Functions +@section Nested Functions +@cindex nested functions +@cindex downward funargs +@cindex thunks + +A @dfn{nested function} is a function defined inside another function. +Nested functions are supported as an extension in GNU C, but are not +supported by GNU C++. + +The nested function's name is local to the block where it is defined. +For example, here we define a nested function named @code{square}, and +call it twice: + +@smallexample +@group +foo (double a, double b) +@{ + double square (double z) @{ return z * z; @} + + return square (a) + square (b); +@} +@end group +@end smallexample + +The nested function can access all the variables of the containing +function that are visible at the point of its definition. This is +called @dfn{lexical scoping}. For example, here we show a nested +function which uses an inherited variable named @code{offset}: + +@smallexample +@group +bar (int *array, int offset, int size) +@{ + int access (int *array, int index) + @{ return array[index + offset]; @} + int i; + /* @r{@dots{}} */ + for (i = 0; i < size; i++) + /* @r{@dots{}} */ access (array, i) /* @r{@dots{}} */ +@} +@end group +@end smallexample + +Nested function definitions are permitted within functions in the places +where variable definitions are allowed; that is, in any block, mixed +with the other declarations and statements in the block. + +It is possible to call the nested function from outside the scope of its +name by storing its address or passing the address to another function: + +@smallexample +hack (int *array, int size) +@{ + void store (int index, int value) + @{ array[index] = value; @} + + intermediate (store, size); +@} +@end smallexample + +Here, the function @code{intermediate} receives the address of +@code{store} as an argument. If @code{intermediate} calls @code{store}, +the arguments given to @code{store} are used to store into @code{array}. +But this technique works only so long as the containing function +(@code{hack}, in this example) does not exit. + +If you try to call the nested function through its address after the +containing function exits, all hell breaks loose. If you try +to call it after a containing scope level exits, and if it refers +to some of the variables that are no longer in scope, you may be lucky, +but it's not wise to take the risk. If, however, the nested function +does not refer to anything that has gone out of scope, you should be +safe. + +GCC implements taking the address of a nested function using a technique +called @dfn{trampolines}. This technique was described in +@cite{Lexical Closures for C++} (Thomas M. Breuel, USENIX +C++ Conference Proceedings, October 17-21, 1988). + +A nested function can jump to a label inherited from a containing +function, provided the label is explicitly declared in the containing +function (@pxref{Local Labels}). Such a jump returns instantly to the +containing function, exiting the nested function that did the +@code{goto} and any intermediate functions as well. Here is an example: + +@smallexample +@group +bar (int *array, int offset, int size) +@{ + __label__ failure; + int access (int *array, int index) + @{ + if (index > size) + goto failure; + return array[index + offset]; + @} + int i; + /* @r{@dots{}} */ + for (i = 0; i < size; i++) + /* @r{@dots{}} */ access (array, i) /* @r{@dots{}} */ + /* @r{@dots{}} */ + return 0; + + /* @r{Control comes here from @code{access} + if it detects an error.} */ + failure: + return -1; +@} +@end group +@end smallexample + +A nested function always has no linkage. Declaring one with +@code{extern} or @code{static} is erroneous. If you need to declare the nested function +before its definition, use @code{auto} (which is otherwise meaningless +for function declarations). + +@smallexample +bar (int *array, int offset, int size) +@{ + __label__ failure; + auto int access (int *, int); + /* @r{@dots{}} */ + int access (int *array, int index) + @{ + if (index > size) + goto failure; + return array[index + offset]; + @} + /* @r{@dots{}} */ +@} +@end smallexample + +@node Nonlocal Gotos +@section Nonlocal Gotos +@cindex nonlocal gotos + +GCC provides the built-in functions @code{__builtin_setjmp} and +@code{__builtin_longjmp} which are similar to, but not interchangeable +with, the C library functions @code{setjmp} and @code{longjmp}. +The built-in versions are used internally by GCC's libraries +to implement exception handling on some targets. You should use the +standard C library functions declared in @code{} in user code +instead of the builtins. + +The built-in versions of these functions use GCC's normal +mechanisms to save and restore registers using the stack on function +entry and exit. The jump buffer argument @var{buf} holds only the +information needed to restore the stack frame, rather than the entire +set of saved register values. + +An important caveat is that GCC arranges to save and restore only +those registers known to the specific architecture variant being +compiled for. This can make @code{__builtin_setjmp} and +@code{__builtin_longjmp} more efficient than their library +counterparts in some cases, but it can also cause incorrect and +mysterious behavior when mixing with code that uses the full register +set. + +You should declare the jump buffer argument @var{buf} to the +built-in functions as: + +@smallexample +#include +intptr_t @var{buf}[5]; +@end smallexample + +@deftypefn {Built-in Function} {int} __builtin_setjmp (intptr_t *@var{buf}) +This function saves the current stack context in @var{buf}. +@code{__builtin_setjmp} returns 0 when returning directly, +and 1 when returning from @code{__builtin_longjmp} using the same +@var{buf}. +@end deftypefn + +@deftypefn {Built-in Function} {void} __builtin_longjmp (intptr_t *@var{buf}, int @var{val}) +This function restores the stack context in @var{buf}, +saved by a previous call to @code{__builtin_setjmp}. After +@code{__builtin_longjmp} is finished, the program resumes execution as +if the matching @code{__builtin_setjmp} returns the value @var{val}, +which must be 1. + +Because @code{__builtin_longjmp} depends on the function return +mechanism to restore the stack context, it cannot be called +from the same function calling @code{__builtin_setjmp} to +initialize @var{buf}. It can only be called from a function called +(directly or indirectly) from the function calling @code{__builtin_setjmp}. +@end deftypefn + +@node Constructing Calls +@section Constructing Function Calls +@cindex constructing calls +@cindex forwarding calls + +Using the built-in functions described below, you can record +the arguments a function received, and call another function +with the same arguments, without knowing the number or types +of the arguments. + +You can also record the return value of that function call, +and later return that value, without knowing what data type +the function tried to return (as long as your caller expects +that data type). + +However, these built-in functions may interact badly with some +sophisticated features or other extensions of the language. It +is, therefore, not recommended to use them outside very simple +functions acting as mere forwarders for their arguments. + +@deftypefn {Built-in Function} {void *} __builtin_apply_args () +This built-in function returns a pointer to data +describing how to perform a call with the same arguments as are passed +to the current function. + +The function saves the arg pointer register, structure value address, +and all registers that might be used to pass arguments to a function +into a block of memory allocated on the stack. Then it returns the +address of that block. +@end deftypefn + +@deftypefn {Built-in Function} {void *} __builtin_apply (void (*@var{function})(), void *@var{arguments}, size_t @var{size}) +This built-in function invokes @var{function} +with a copy of the parameters described by @var{arguments} +and @var{size}. + +The value of @var{arguments} should be the value returned by +@code{__builtin_apply_args}. The argument @var{size} specifies the size +of the stack argument data, in bytes. + +This function returns a pointer to data describing +how to return whatever value is returned by @var{function}. The data +is saved in a block of memory allocated on the stack. + +It is not always simple to compute the proper value for @var{size}. The +value is used by @code{__builtin_apply} to compute the amount of data +that should be pushed on the stack and copied from the incoming argument +area. +@end deftypefn + +@deftypefn {Built-in Function} {void} __builtin_return (void *@var{result}) +This built-in function returns the value described by @var{result} from +the containing function. You should specify, for @var{result}, a value +returned by @code{__builtin_apply}. +@end deftypefn + +@deftypefn {Built-in Function} {} __builtin_va_arg_pack () +This built-in function represents all anonymous arguments of an inline +function. It can be used only in inline functions that are always +inlined, never compiled as a separate function, such as those using +@code{__attribute__ ((__always_inline__))} or +@code{__attribute__ ((__gnu_inline__))} extern inline functions. +It must be only passed as last argument to some other function +with variable arguments. This is useful for writing small wrapper +inlines for variable argument functions, when using preprocessor +macros is undesirable. For example: +@smallexample +extern int myprintf (FILE *f, const char *format, ...); +extern inline __attribute__ ((__gnu_inline__)) int +myprintf (FILE *f, const char *format, ...) +@{ + int r = fprintf (f, "myprintf: "); + if (r < 0) + return r; + int s = fprintf (f, format, __builtin_va_arg_pack ()); + if (s < 0) + return s; + return r + s; +@} +@end smallexample +@end deftypefn + +@deftypefn {Built-in Function} {size_t} __builtin_va_arg_pack_len () +This built-in function returns the number of anonymous arguments of +an inline function. It can be used only in inline functions that +are always inlined, never compiled as a separate function, such +as those using @code{__attribute__ ((__always_inline__))} or +@code{__attribute__ ((__gnu_inline__))} extern inline functions. +For example following does link- or run-time checking of open +arguments for optimized code: +@smallexample +#ifdef __OPTIMIZE__ +extern inline __attribute__((__gnu_inline__)) int +myopen (const char *path, int oflag, ...) +@{ + if (__builtin_va_arg_pack_len () > 1) + warn_open_too_many_arguments (); + + if (__builtin_constant_p (oflag)) + @{ + if ((oflag & O_CREAT) != 0 && __builtin_va_arg_pack_len () < 1) + @{ + warn_open_missing_mode (); + return __open_2 (path, oflag); + @} + return open (path, oflag, __builtin_va_arg_pack ()); + @} + + if (__builtin_va_arg_pack_len () < 1) + return __open_2 (path, oflag); + + return open (path, oflag, __builtin_va_arg_pack ()); +@} +#endif +@end smallexample +@end deftypefn + +@node Typeof +@section Referring to a Type with @code{typeof} +@findex typeof +@findex sizeof +@cindex macros, types of arguments + +Another way to refer to the type of an expression is with @code{typeof}. +The syntax of using of this keyword looks like @code{sizeof}, but the +construct acts semantically like a type name defined with @code{typedef}. + +There are two ways of writing the argument to @code{typeof}: with an +expression or with a type. Here is an example with an expression: + +@smallexample +typeof (x[0](1)) +@end smallexample + +@noindent +This assumes that @code{x} is an array of pointers to functions; +the type described is that of the values of the functions. + +Here is an example with a typename as the argument: + +@smallexample +typeof (int *) +@end smallexample + +@noindent +Here the type described is that of pointers to @code{int}. + +If you are writing a header file that must work when included in ISO C +programs, write @code{__typeof__} instead of @code{typeof}. +@xref{Alternate Keywords}. + +A @code{typeof} construct can be used anywhere a typedef name can be +used. For example, you can use it in a declaration, in a cast, or inside +of @code{sizeof} or @code{typeof}. + +The operand of @code{typeof} is evaluated for its side effects if and +only if it is an expression of variably modified type or the name of +such a type. + +@code{typeof} is often useful in conjunction with +statement expressions (@pxref{Statement Exprs}). +Here is how the two together can +be used to define a safe ``maximum'' macro which operates on any +arithmetic type and evaluates each of its arguments exactly once: + +@smallexample +#define max(a,b) \ + (@{ typeof (a) _a = (a); \ + typeof (b) _b = (b); \ + _a > _b ? _a : _b; @}) +@end smallexample + +@cindex underscores in variables in macros +@cindex @samp{_} in variables in macros +@cindex local variables in macros +@cindex variables, local, in macros +@cindex macros, local variables in + +The reason for using names that start with underscores for the local +variables is to avoid conflicts with variable names that occur within the +expressions that are substituted for @code{a} and @code{b}. Eventually we +hope to design a new form of declaration syntax that allows you to declare +variables whose scopes start only after their initializers; this will be a +more reliable way to prevent such conflicts. + +@noindent +Some more examples of the use of @code{typeof}: + +@itemize @bullet +@item +This declares @code{y} with the type of what @code{x} points to. + +@smallexample +typeof (*x) y; +@end smallexample + +@item +This declares @code{y} as an array of such values. + +@smallexample +typeof (*x) y[4]; +@end smallexample + +@item +This declares @code{y} as an array of pointers to characters: + +@smallexample +typeof (typeof (char *)[4]) y; +@end smallexample + +@noindent +It is equivalent to the following traditional C declaration: + +@smallexample +char *y[4]; +@end smallexample + +To see the meaning of the declaration using @code{typeof}, and why it +might be a useful way to write, rewrite it with these macros: + +@smallexample +#define pointer(T) typeof(T *) +#define array(T, N) typeof(T [N]) +@end smallexample + +@noindent +Now the declaration can be rewritten this way: + +@smallexample +array (pointer (char), 4) y; +@end smallexample + +@noindent +Thus, @code{array (pointer (char), 4)} is the type of arrays of 4 +pointers to @code{char}. +@end itemize + +In GNU C, but not GNU C++, you may also declare the type of a variable +as @code{__auto_type}. In that case, the declaration must declare +only one variable, whose declarator must just be an identifier, the +declaration must be initialized, and the type of the variable is +determined by the initializer; the name of the variable is not in +scope until after the initializer. (In C++, you should use C++11 +@code{auto} for this purpose.) Using @code{__auto_type}, the +``maximum'' macro above could be written as: + +@smallexample +#define max(a,b) \ + (@{ __auto_type _a = (a); \ + __auto_type _b = (b); \ + _a > _b ? _a : _b; @}) +@end smallexample + +Using @code{__auto_type} instead of @code{typeof} has two advantages: + +@itemize @bullet +@item Each argument to the macro appears only once in the expansion of +the macro. This prevents the size of the macro expansion growing +exponentially when calls to such macros are nested inside arguments of +such macros. + +@item If the argument to the macro has variably modified type, it is +evaluated only once when using @code{__auto_type}, but twice if +@code{typeof} is used. +@end itemize + +@node Conditionals +@section Conditionals with Omitted Operands +@cindex conditional expressions, extensions +@cindex omitted middle-operands +@cindex middle-operands, omitted +@cindex extensions, @code{?:} +@cindex @code{?:} extensions + +The middle operand in a conditional expression may be omitted. Then +if the first operand is nonzero, its value is the value of the conditional +expression. + +Therefore, the expression + +@smallexample +x ? : y +@end smallexample + +@noindent +has the value of @code{x} if that is nonzero; otherwise, the value of +@code{y}. + +This example is perfectly equivalent to + +@smallexample +x ? x : y +@end smallexample + +@cindex side effect in @code{?:} +@cindex @code{?:} side effect +@noindent +In this simple case, the ability to omit the middle operand is not +especially useful. When it becomes useful is when the first operand does, +or may (if it is a macro argument), contain a side effect. Then repeating +the operand in the middle would perform the side effect twice. Omitting +the middle operand uses the value already computed without the undesirable +effects of recomputing it. + +@node __int128 +@section 128-bit Integers +@cindex @code{__int128} data types + +As an extension the integer scalar type @code{__int128} is supported for +targets which have an integer mode wide enough to hold 128 bits. +Simply write @code{__int128} for a signed 128-bit integer, or +@code{unsigned __int128} for an unsigned 128-bit integer. There is no +support in GCC for expressing an integer constant of type @code{__int128} +for targets with @code{long long} integer less than 128 bits wide. + +@node Long Long +@section Double-Word Integers +@cindex @code{long long} data types +@cindex double-word arithmetic +@cindex multiprecision arithmetic +@cindex @code{LL} integer suffix +@cindex @code{ULL} integer suffix + +ISO C99 and ISO C++11 support data types for integers that are at least +64 bits wide, and as an extension GCC supports them in C90 and C++98 modes. +Simply write @code{long long int} for a signed integer, or +@code{unsigned long long int} for an unsigned integer. To make an +integer constant of type @code{long long int}, add the suffix @samp{LL} +to the integer. To make an integer constant of type @code{unsigned long +long int}, add the suffix @samp{ULL} to the integer. + +You can use these types in arithmetic like any other integer types. +Addition, subtraction, and bitwise boolean operations on these types +are open-coded on all types of machines. Multiplication is open-coded +if the machine supports a fullword-to-doubleword widening multiply +instruction. Division and shifts are open-coded only on machines that +provide special support. The operations that are not open-coded use +special library routines that come with GCC@. + +There may be pitfalls when you use @code{long long} types for function +arguments without function prototypes. If a function +expects type @code{int} for its argument, and you pass a value of type +@code{long long int}, confusion results because the caller and the +subroutine disagree about the number of bytes for the argument. +Likewise, if the function expects @code{long long int} and you pass +@code{int}. The best way to avoid such problems is to use prototypes. + +@node Complex +@section Complex Numbers +@cindex complex numbers +@cindex @code{_Complex} keyword +@cindex @code{__complex__} keyword + +ISO C99 supports complex floating data types, and as an extension GCC +supports them in C90 mode and in C++. GCC also supports complex integer data +types which are not part of ISO C99. You can declare complex types +using the keyword @code{_Complex}. As an extension, the older GNU +keyword @code{__complex__} is also supported. + +For example, @samp{_Complex double x;} declares @code{x} as a +variable whose real part and imaginary part are both of type +@code{double}. @samp{_Complex short int y;} declares @code{y} to +have real and imaginary parts of type @code{short int}; this is not +likely to be useful, but it shows that the set of complex types is +complete. + +To write a constant with a complex data type, use the suffix @samp{i} or +@samp{j} (either one; they are equivalent). For example, @code{2.5fi} +has type @code{_Complex float} and @code{3i} has type +@code{_Complex int}. Such a constant always has a pure imaginary +value, but you can form any complex value you like by adding one to a +real constant. This is a GNU extension; if you have an ISO C99 +conforming C library (such as the GNU C Library), and want to construct complex +constants of floating type, you should include @code{} and +use the macros @code{I} or @code{_Complex_I} instead. + +The ISO C++14 library also defines the @samp{i} suffix, so C++14 code +that includes the @samp{} header cannot use @samp{i} for the +GNU extension. The @samp{j} suffix still has the GNU meaning. + +GCC can handle both implicit and explicit casts between the @code{_Complex} +types and other @code{_Complex} types as casting both the real and imaginary +parts to the scalar type. +GCC can handle implicit and explicit casts from a scalar type to a @code{_Complex} +type and where the imaginary part will be considered zero. +The C front-end can handle implicit and explicit casts from a @code{_Complex} type +to a scalar type where the imaginary part will be ignored. In C++ code, this cast +is considered illformed and G++ will error out. + +GCC provides a built-in function @code{__builtin_complex} will can be used to +construct a complex value. + +@cindex @code{__real__} keyword +@cindex @code{__imag__} keyword + +GCC has a few extensions which can be used to extract the real +and the imaginary part of the complex-valued expression. Note +these expressions are lvalues if the @var{exp} is an lvalue. +These expressions operands have the type of a complex type +which might get prompoted to a complex type from a scalar type. +E.g. @code{__real__ (int)@var{x}} is the same as casting to +@code{_Complex int} before @code{__real__} is done. + +@multitable @columnfractions .4 .6 +@headitem Expression @tab Description +@item @code{__real__ @var{exp}} +@tab Extract the real part of @var{exp}. +@item @code{__imag__ @var{exp}} +@tab Extract the imaginary part of @var{exp}. +@end multitable + +For values of floating point, you should use the ISO C99 +functions, declared in @code{} and also provided as +built-in functions by GCC@. + +@multitable @columnfractions .4 .2 .2 .2 +@headitem Expression @tab float @tab double @tab long double +@item @code{__real__ @var{exp}} +@tab @code{crealf} @tab @code{creal} @tab @code{creall} +@item @code{__imag__ @var{exp}} +@tab @code{cimagf} @tab @code{cimag} @tab @code{cimagl} +@end multitable + +@cindex complex conjugation +The operator @samp{~} performs complex conjugation when used on a value +with a complex type. This is a GNU extension; for values of +floating type, you should use the ISO C99 functions @code{conjf}, +@code{conj} and @code{conjl}, declared in @code{} and also +provided as built-in functions by GCC@. Note unlike the @code{__real__} +and @code{__imag__} operators, this operator will not do an implicit cast +to the complex type because the @samp{~} is already a normal operator. + +GCC can allocate complex automatic variables in a noncontiguous +fashion; it's even possible for the real part to be in a register while +the imaginary part is on the stack (or vice versa). Only the DWARF +debug info format can represent this, so use of DWARF is recommended. +If you are using the stabs debug info format, GCC describes a noncontiguous +complex variable as if it were two separate variables of noncomplex type. +If the variable's actual name is @code{foo}, the two fictitious +variables are named @code{foo$real} and @code{foo$imag}. You can +examine and set these two fictitious variables with your debugger. + +@deftypefn {Built-in Function} @var{type} __builtin_complex (@var{real}, @var{imag}) + +The built-in function @code{__builtin_complex} is provided for use in +implementing the ISO C11 macros @code{CMPLXF}, @code{CMPLX} and +@code{CMPLXL}. @var{real} and @var{imag} must have the same type, a +real binary floating-point type, and the result has the corresponding +complex type with real and imaginary parts @var{real} and @var{imag}. +Unlike @samp{@var{real} + I * @var{imag}}, this works even when +infinities, NaNs and negative zeros are involved. + +@end deftypefn + +@node Floating Types +@section Additional Floating Types +@cindex additional floating types +@cindex @code{_Float@var{n}} data types +@cindex @code{_Float@var{n}x} data types +@cindex @code{__float80} data type +@cindex @code{__float128} data type +@cindex @code{__ibm128} data type +@cindex @code{w} floating point suffix +@cindex @code{q} floating point suffix +@cindex @code{W} floating point suffix +@cindex @code{Q} floating point suffix + +ISO/IEC TS 18661-3:2015 defines C support for additional floating +types @code{_Float@var{n}} and @code{_Float@var{n}x}, and GCC supports +these type names; the set of types supported depends on the target +architecture. These types are not supported when compiling C++. +Constants with these types use suffixes @code{f@var{n}} or +@code{F@var{n}} and @code{f@var{n}x} or @code{F@var{n}x}. These type +names can be used together with @code{_Complex} to declare complex +types. + +As an extension, GNU C and GNU C++ support additional floating +types, which are not supported by all targets. +@itemize @bullet +@item @code{__float128} is available on i386, x86_64, IA-64, and +hppa HP-UX, as well as on PowerPC GNU/Linux targets that enable +the vector scalar (VSX) instruction set. @code{__float128} supports +the 128-bit floating type. On i386, x86_64, PowerPC, and IA-64 +other than HP-UX, @code{__float128} is an alias for @code{_Float128}. +On hppa and IA-64 HP-UX, @code{__float128} is an alias for @code{long +double}. + +@item @code{__float80} is available on the i386, x86_64, and IA-64 +targets, and supports the 80-bit (@code{XFmode}) floating type. It is +an alias for the type name @code{_Float64x} on these targets. + +@item @code{__ibm128} is available on PowerPC targets, and provides +access to the IBM extended double format which is the current format +used for @code{long double}. When @code{long double} transitions to +@code{__float128} on PowerPC in the future, @code{__ibm128} will remain +for use in conversions between the two types. +@end itemize + +Support for these additional types includes the arithmetic operators: +add, subtract, multiply, divide; unary arithmetic operators; +relational operators; equality operators; and conversions to and from +integer and other floating types. Use a suffix @samp{w} or @samp{W} +in a literal constant of type @code{__float80} or type +@code{__ibm128}. Use a suffix @samp{q} or @samp{Q} for @code{__float128}. + +In order to use @code{_Float128}, @code{__float128}, and @code{__ibm128} +on PowerPC Linux systems, you must use the @option{-mfloat128} option. It is +expected in future versions of GCC that @code{_Float128} and @code{__float128} +will be enabled automatically. + +The @code{_Float128} type is supported on all systems where +@code{__float128} is supported or where @code{long double} has the +IEEE binary128 format. The @code{_Float64x} type is supported on all +systems where @code{__float128} is supported. The @code{_Float32} +type is supported on all systems supporting IEEE binary32; the +@code{_Float64} and @code{_Float32x} types are supported on all systems +supporting IEEE binary64. The @code{_Float16} type is supported on AArch64 +systems by default, on ARM systems when the IEEE format for 16-bit +floating-point types is selected with @option{-mfp16-format=ieee} and, +for both C and C++, on x86 systems with SSE2 enabled. GCC does not currently +support @code{_Float128x} on any systems. + +On the i386, x86_64, IA-64, and HP-UX targets, you can declare complex +types using the corresponding internal complex type, @code{XCmode} for +@code{__float80} type and @code{TCmode} for @code{__float128} type: + +@smallexample +typedef _Complex float __attribute__((mode(TC))) _Complex128; +typedef _Complex float __attribute__((mode(XC))) _Complex80; +@end smallexample + +On the PowerPC Linux VSX targets, you can declare complex types using +the corresponding internal complex type, @code{KCmode} for +@code{__float128} type and @code{ICmode} for @code{__ibm128} type: + +@smallexample +typedef _Complex float __attribute__((mode(KC))) _Complex_float128; +typedef _Complex float __attribute__((mode(IC))) _Complex_ibm128; +@end smallexample + +@node Half-Precision +@section Half-Precision Floating Point +@cindex half-precision floating point +@cindex @code{__fp16} data type +@cindex @code{__Float16} data type + +On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating +point via the @code{__fp16} type defined in the ARM C Language Extensions. +On ARM systems, you must enable this type explicitly with the +@option{-mfp16-format} command-line option in order to use it. +On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit) +floating point via the @code{_Float16} type. For C++, x86 provides a builtin +type named @code{_Float16} which contains same data format as C. + +ARM targets support two incompatible representations for half-precision +floating-point values. You must choose one of the representations and +use it consistently in your program. + +Specifying @option{-mfp16-format=ieee} selects the IEEE 754-2008 format. +This format can represent normalized values in the range of @math{2^{-14}} to 65504. +There are 11 bits of significand precision, approximately 3 +decimal digits. + +Specifying @option{-mfp16-format=alternative} selects the ARM +alternative format. This representation is similar to the IEEE +format, but does not support infinities or NaNs. Instead, the range +of exponents is extended, so that this format can represent normalized +values in the range of @math{2^{-14}} to 131008. + +The GCC port for AArch64 only supports the IEEE 754-2008 format, and does +not require use of the @option{-mfp16-format} command-line option. + +The @code{__fp16} type may only be used as an argument to intrinsics defined +in @code{}, or as a storage format. For purposes of +arithmetic and other operations, @code{__fp16} values in C or C++ +expressions are automatically promoted to @code{float}. + +The ARM target provides hardware support for conversions between +@code{__fp16} and @code{float} values +as an extension to VFP and NEON (Advanced SIMD), and from ARMv8-A provides +hardware support for conversions between @code{__fp16} and @code{double} +values. GCC generates code using these hardware instructions if you +compile with options to select an FPU that provides them; +for example, @option{-mfpu=neon-fp16 -mfloat-abi=softfp}, +in addition to the @option{-mfp16-format} option to select +a half-precision format. + +Language-level support for the @code{__fp16} data type is +independent of whether GCC generates code using hardware floating-point +instructions. In cases where hardware support is not specified, GCC +implements conversions between @code{__fp16} and other types as library +calls. + +It is recommended that portable code use the @code{_Float16} type defined +by ISO/IEC TS 18661-3:2015. @xref{Floating Types}. + +On x86 targets with SSE2 enabled, without @option{-mavx512fp16}, +all operations will be emulated by software emulation and the @code{float} +instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep the +intermediate result of the operation as 32-bit precision. This may lead to +inconsistent behavior between software emulation and AVX512-FP16 instructions. +Using @option{-fexcess-precision=16} will force round back after each operation. + +Using @option{-mavx512fp16} will generate AVX512-FP16 instructions instead of +software emulation. The default behavior of @code{FLT_EVAL_METHOD} is to round +after each operation. The same is true with @option{-fexcess-precision=standard} +and @option{-mfpmath=sse}. If there is no @option{-mfpmath=sse}, +@option{-fexcess-precision=standard} alone does the same thing as before, +It is useful for code that does not have @code{_Float16} and runs on the x87 +FPU. + +@node Decimal Float +@section Decimal Floating Types +@cindex decimal floating types +@cindex @code{_Decimal32} data type +@cindex @code{_Decimal64} data type +@cindex @code{_Decimal128} data type +@cindex @code{df} integer suffix +@cindex @code{dd} integer suffix +@cindex @code{dl} integer suffix +@cindex @code{DF} integer suffix +@cindex @code{DD} integer suffix +@cindex @code{DL} integer suffix + +As an extension, GNU C supports decimal floating types as +defined in the N1312 draft of ISO/IEC WDTR24732. Support for decimal +floating types in GCC will evolve as the draft technical report changes. +Calling conventions for any target might also change. Not all targets +support decimal floating types. + +The decimal floating types are @code{_Decimal32}, @code{_Decimal64}, and +@code{_Decimal128}. They use a radix of ten, unlike the floating types +@code{float}, @code{double}, and @code{long double} whose radix is not +specified by the C standard but is usually two. + +Support for decimal floating types includes the arithmetic operators +add, subtract, multiply, divide; unary arithmetic operators; +relational operators; equality operators; and conversions to and from +integer and other floating types. Use a suffix @samp{df} or +@samp{DF} in a literal constant of type @code{_Decimal32}, @samp{dd} +or @samp{DD} for @code{_Decimal64}, and @samp{dl} or @samp{DL} for +@code{_Decimal128}. + +GCC support of decimal float as specified by the draft technical report +is incomplete: + +@itemize @bullet +@item +When the value of a decimal floating type cannot be represented in the +integer type to which it is being converted, the result is undefined +rather than the result value specified by the draft technical report. + +@item +GCC does not provide the C library functionality associated with +@file{math.h}, @file{fenv.h}, @file{stdio.h}, @file{stdlib.h}, and +@file{wchar.h}, which must come from a separate C library implementation. +Because of this the GNU C compiler does not define macro +@code{__STDC_DEC_FP__} to indicate that the implementation conforms to +the technical report. +@end itemize + +Types @code{_Decimal32}, @code{_Decimal64}, and @code{_Decimal128} +are supported by the DWARF debug information format. + +@node Hex Floats +@section Hex Floats +@cindex hex floats + +ISO C99 and ISO C++17 support floating-point numbers written not only in +the usual decimal notation, such as @code{1.55e1}, but also numbers such as +@code{0x1.fp3} written in hexadecimal format. As a GNU extension, GCC +supports this in C90 mode (except in some cases when strictly +conforming) and in C++98, C++11 and C++14 modes. In that format the +@samp{0x} hex introducer and the @samp{p} or @samp{P} exponent field are +mandatory. The exponent is a decimal number that indicates the power of +2 by which the significant part is multiplied. Thus @samp{0x1.f} is +@tex +$1 {15\over16}$, +@end tex +@ifnottex +1 15/16, +@end ifnottex +@samp{p3} multiplies it by 8, and the value of @code{0x1.fp3} +is the same as @code{1.55e1}. + +Unlike for floating-point numbers in the decimal notation the exponent +is always required in the hexadecimal notation. Otherwise the compiler +would not be able to resolve the ambiguity of, e.g., @code{0x1.f}. This +could mean @code{1.0f} or @code{1.9375} since @samp{f} is also the +extension for floating-point constants of type @code{float}. + +@node Fixed-Point +@section Fixed-Point Types +@cindex fixed-point types +@cindex @code{_Fract} data type +@cindex @code{_Accum} data type +@cindex @code{_Sat} data type +@cindex @code{hr} fixed-suffix +@cindex @code{r} fixed-suffix +@cindex @code{lr} fixed-suffix +@cindex @code{llr} fixed-suffix +@cindex @code{uhr} fixed-suffix +@cindex @code{ur} fixed-suffix +@cindex @code{ulr} fixed-suffix +@cindex @code{ullr} fixed-suffix +@cindex @code{hk} fixed-suffix +@cindex @code{k} fixed-suffix +@cindex @code{lk} fixed-suffix +@cindex @code{llk} fixed-suffix +@cindex @code{uhk} fixed-suffix +@cindex @code{uk} fixed-suffix +@cindex @code{ulk} fixed-suffix +@cindex @code{ullk} fixed-suffix +@cindex @code{HR} fixed-suffix +@cindex @code{R} fixed-suffix +@cindex @code{LR} fixed-suffix +@cindex @code{LLR} fixed-suffix +@cindex @code{UHR} fixed-suffix +@cindex @code{UR} fixed-suffix +@cindex @code{ULR} fixed-suffix +@cindex @code{ULLR} fixed-suffix +@cindex @code{HK} fixed-suffix +@cindex @code{K} fixed-suffix +@cindex @code{LK} fixed-suffix +@cindex @code{LLK} fixed-suffix +@cindex @code{UHK} fixed-suffix +@cindex @code{UK} fixed-suffix +@cindex @code{ULK} fixed-suffix +@cindex @code{ULLK} fixed-suffix + +As an extension, GNU C supports fixed-point types as +defined in the N1169 draft of ISO/IEC DTR 18037. Support for fixed-point +types in GCC will evolve as the draft technical report changes. +Calling conventions for any target might also change. Not all targets +support fixed-point types. + +The fixed-point types are +@code{short _Fract}, +@code{_Fract}, +@code{long _Fract}, +@code{long long _Fract}, +@code{unsigned short _Fract}, +@code{unsigned _Fract}, +@code{unsigned long _Fract}, +@code{unsigned long long _Fract}, +@code{_Sat short _Fract}, +@code{_Sat _Fract}, +@code{_Sat long _Fract}, +@code{_Sat long long _Fract}, +@code{_Sat unsigned short _Fract}, +@code{_Sat unsigned _Fract}, +@code{_Sat unsigned long _Fract}, +@code{_Sat unsigned long long _Fract}, +@code{short _Accum}, +@code{_Accum}, +@code{long _Accum}, +@code{long long _Accum}, +@code{unsigned short _Accum}, +@code{unsigned _Accum}, +@code{unsigned long _Accum}, +@code{unsigned long long _Accum}, +@code{_Sat short _Accum}, +@code{_Sat _Accum}, +@code{_Sat long _Accum}, +@code{_Sat long long _Accum}, +@code{_Sat unsigned short _Accum}, +@code{_Sat unsigned _Accum}, +@code{_Sat unsigned long _Accum}, +@code{_Sat unsigned long long _Accum}. + +Fixed-point data values contain fractional and optional integral parts. +The format of fixed-point data varies and depends on the target machine. + +Support for fixed-point types includes: +@itemize @bullet +@item +prefix and postfix increment and decrement operators (@code{++}, @code{--}) +@item +unary arithmetic operators (@code{+}, @code{-}, @code{!}) +@item +binary arithmetic operators (@code{+}, @code{-}, @code{*}, @code{/}) +@item +binary shift operators (@code{<<}, @code{>>}) +@item +relational operators (@code{<}, @code{<=}, @code{>=}, @code{>}) +@item +equality operators (@code{==}, @code{!=}) +@item +assignment operators (@code{+=}, @code{-=}, @code{*=}, @code{/=}, +@code{<<=}, @code{>>=}) +@item +conversions to and from integer, floating-point, or fixed-point types +@end itemize + +Use a suffix in a fixed-point literal constant: +@itemize +@item @samp{hr} or @samp{HR} for @code{short _Fract} and +@code{_Sat short _Fract} +@item @samp{r} or @samp{R} for @code{_Fract} and @code{_Sat _Fract} +@item @samp{lr} or @samp{LR} for @code{long _Fract} and +@code{_Sat long _Fract} +@item @samp{llr} or @samp{LLR} for @code{long long _Fract} and +@code{_Sat long long _Fract} +@item @samp{uhr} or @samp{UHR} for @code{unsigned short _Fract} and +@code{_Sat unsigned short _Fract} +@item @samp{ur} or @samp{UR} for @code{unsigned _Fract} and +@code{_Sat unsigned _Fract} +@item @samp{ulr} or @samp{ULR} for @code{unsigned long _Fract} and +@code{_Sat unsigned long _Fract} +@item @samp{ullr} or @samp{ULLR} for @code{unsigned long long _Fract} +and @code{_Sat unsigned long long _Fract} +@item @samp{hk} or @samp{HK} for @code{short _Accum} and +@code{_Sat short _Accum} +@item @samp{k} or @samp{K} for @code{_Accum} and @code{_Sat _Accum} +@item @samp{lk} or @samp{LK} for @code{long _Accum} and +@code{_Sat long _Accum} +@item @samp{llk} or @samp{LLK} for @code{long long _Accum} and +@code{_Sat long long _Accum} +@item @samp{uhk} or @samp{UHK} for @code{unsigned short _Accum} and +@code{_Sat unsigned short _Accum} +@item @samp{uk} or @samp{UK} for @code{unsigned _Accum} and +@code{_Sat unsigned _Accum} +@item @samp{ulk} or @samp{ULK} for @code{unsigned long _Accum} and +@code{_Sat unsigned long _Accum} +@item @samp{ullk} or @samp{ULLK} for @code{unsigned long long _Accum} +and @code{_Sat unsigned long long _Accum} +@end itemize + +GCC support of fixed-point types as specified by the draft technical report +is incomplete: + +@itemize @bullet +@item +Pragmas to control overflow and rounding behaviors are not implemented. +@end itemize + +Fixed-point types are supported by the DWARF debug information format. + +@node Named Address Spaces +@section Named Address Spaces +@cindex Named Address Spaces + +As an extension, GNU C supports named address spaces as +defined in the N1275 draft of ISO/IEC DTR 18037. Support for named +address spaces in GCC will evolve as the draft technical report +changes. Calling conventions for any target might also change. At +present, only the AVR, M32C, PRU, RL78, and x86 targets support +address spaces other than the generic address space. + +Address space identifiers may be used exactly like any other C type +qualifier (e.g., @code{const} or @code{volatile}). See the N1275 +document for more details. + +@anchor{AVR Named Address Spaces} +@subsection AVR Named Address Spaces + +On the AVR target, there are several address spaces that can be used +in order to put read-only data into the flash memory and access that +data by means of the special instructions @code{LPM} or @code{ELPM} +needed to read from flash. + +Devices belonging to @code{avrtiny} and @code{avrxmega3} can access +flash memory by means of @code{LD*} instructions because the flash +memory is mapped into the RAM address space. There is @emph{no need} +for language extensions like @code{__flash} or attribute +@ref{AVR Variable Attributes,,@code{progmem}}. +The default linker description files for these devices cater for that +feature and @code{.rodata} stays in flash: The compiler just generates +@code{LD*} instructions, and the linker script adds core specific +offsets to all @code{.rodata} symbols: @code{0x4000} in the case of +@code{avrtiny} and @code{0x8000} in the case of @code{avrxmega3}. +See @ref{AVR Options} for a list of respective devices. + +For devices not in @code{avrtiny} or @code{avrxmega3}, +any data including read-only data is located in RAM (the generic +address space) because flash memory is not visible in the RAM address +space. In order to locate read-only data in flash memory @emph{and} +to generate the right instructions to access this data without +using (inline) assembler code, special address spaces are needed. + +@table @code +@item __flash +@cindex @code{__flash} AVR Named Address Spaces +The @code{__flash} qualifier locates data in the +@code{.progmem.data} section. Data is read using the @code{LPM} +instruction. Pointers to this address space are 16 bits wide. + +@item __flash1 +@itemx __flash2 +@itemx __flash3 +@itemx __flash4 +@itemx __flash5 +@cindex @code{__flash1} AVR Named Address Spaces +@cindex @code{__flash2} AVR Named Address Spaces +@cindex @code{__flash3} AVR Named Address Spaces +@cindex @code{__flash4} AVR Named Address Spaces +@cindex @code{__flash5} AVR Named Address Spaces +These are 16-bit address spaces locating data in section +@code{.progmem@var{N}.data} where @var{N} refers to +address space @code{__flash@var{N}}. +The compiler sets the @code{RAMPZ} segment register appropriately +before reading data by means of the @code{ELPM} instruction. + +@item __memx +@cindex @code{__memx} AVR Named Address Spaces +This is a 24-bit address space that linearizes flash and RAM: +If the high bit of the address is set, data is read from +RAM using the lower two bytes as RAM address. +If the high bit of the address is clear, data is read from flash +with @code{RAMPZ} set according to the high byte of the address. +@xref{AVR Built-in Functions,,@code{__builtin_avr_flash_segment}}. + +Objects in this address space are located in @code{.progmemx.data}. +@end table + +@b{Example} + +@smallexample +char my_read (const __flash char ** p) +@{ + /* p is a pointer to RAM that points to a pointer to flash. + The first indirection of p reads that flash pointer + from RAM and the second indirection reads a char from this + flash address. */ + + return **p; +@} + +/* Locate array[] in flash memory */ +const __flash int array[] = @{ 3, 5, 7, 11, 13, 17, 19 @}; + +int i = 1; + +int main (void) +@{ + /* Return 17 by reading from flash memory */ + return array[array[i]]; +@} +@end smallexample + +@noindent +For each named address space supported by avr-gcc there is an equally +named but uppercase built-in macro defined. +The purpose is to facilitate testing if respective address space +support is available or not: + +@smallexample +#ifdef __FLASH +const __flash int var = 1; + +int read_var (void) +@{ + return var; +@} +#else +#include /* From AVR-LibC */ + +const int var PROGMEM = 1; + +int read_var (void) +@{ + return (int) pgm_read_word (&var); +@} +#endif /* __FLASH */ +@end smallexample + +@noindent +Notice that attribute @ref{AVR Variable Attributes,,@code{progmem}} +locates data in flash but +accesses to these data read from generic address space, i.e.@: +from RAM, +so that you need special accessors like @code{pgm_read_byte} +from @w{@uref{http://nongnu.org/avr-libc/user-manual/,AVR-LibC}} +together with attribute @code{progmem}. + +@noindent +@b{Limitations and caveats} + +@itemize +@item +Reading across the 64@tie{}KiB section boundary of +the @code{__flash} or @code{__flash@var{N}} address spaces +shows undefined behavior. The only address space that +supports reading across the 64@tie{}KiB flash segment boundaries is +@code{__memx}. + +@item +If you use one of the @code{__flash@var{N}} address spaces +you must arrange your linker script to locate the +@code{.progmem@var{N}.data} sections according to your needs. + +@item +Any data or pointers to the non-generic address spaces must +be qualified as @code{const}, i.e.@: as read-only data. +This still applies if the data in one of these address +spaces like software version number or calibration lookup table are intended to +be changed after load time by, say, a boot loader. In this case +the right qualification is @code{const} @code{volatile} so that the compiler +must not optimize away known values or insert them +as immediates into operands of instructions. + +@item +The following code initializes a variable @code{pfoo} +located in static storage with a 24-bit address: +@smallexample +extern const __memx char foo; +const __memx void *pfoo = &foo; +@end smallexample + +@item +On the reduced Tiny devices like ATtiny40, no address spaces are supported. +Just use vanilla C / C++ code without overhead as outlined above. +Attribute @code{progmem} is supported but works differently, +see @ref{AVR Variable Attributes}. + +@end itemize + +@subsection M32C Named Address Spaces +@cindex @code{__far} M32C Named Address Spaces + +On the M32C target, with the R8C and M16C CPU variants, variables +qualified with @code{__far} are accessed using 32-bit addresses in +order to access memory beyond the first 64@tie{}Ki bytes. If +@code{__far} is used with the M32CM or M32C CPU variants, it has no +effect. + +@subsection PRU Named Address Spaces +@cindex @code{__regio_symbol} PRU Named Address Spaces + +On the PRU target, variables qualified with @code{__regio_symbol} are +aliases used to access the special I/O CPU registers. They must be +declared as @code{extern} because such variables will not be allocated in +any data memory. They must also be marked as @code{volatile}, and can +only be 32-bit integer types. The only names those variables can have +are @code{__R30} and @code{__R31}, representing respectively the +@code{R30} and @code{R31} special I/O CPU registers. Hence the following +example is the only valid usage of @code{__regio_symbol}: + +@smallexample +extern volatile __regio_symbol uint32_t __R30; +extern volatile __regio_symbol uint32_t __R31; +@end smallexample + +@subsection RL78 Named Address Spaces +@cindex @code{__far} RL78 Named Address Spaces + +On the RL78 target, variables qualified with @code{__far} are accessed +with 32-bit pointers (20-bit addresses) rather than the default 16-bit +addresses. Non-far variables are assumed to appear in the topmost +64@tie{}KiB of the address space. + +@subsection x86 Named Address Spaces +@cindex x86 named address spaces + +On the x86 target, variables may be declared as being relative +to the @code{%fs} or @code{%gs} segments. + +@table @code +@item __seg_fs +@itemx __seg_gs +@cindex @code{__seg_fs} x86 named address space +@cindex @code{__seg_gs} x86 named address space +The object is accessed with the respective segment override prefix. + +The respective segment base must be set via some method specific to +the operating system. Rather than require an expensive system call +to retrieve the segment base, these address spaces are not considered +to be subspaces of the generic (flat) address space. This means that +explicit casts are required to convert pointers between these address +spaces and the generic address space. In practice the application +should cast to @code{uintptr_t} and apply the segment base offset +that it installed previously. + +The preprocessor symbols @code{__SEG_FS} and @code{__SEG_GS} are +defined when these address spaces are supported. +@end table + +@node Zero Length +@section Arrays of Length Zero +@cindex arrays of length zero +@cindex zero-length arrays +@cindex length-zero arrays +@cindex flexible array members + +Declaring zero-length arrays is allowed in GNU C as an extension. +A zero-length array can be useful as the last element of a structure +that is really a header for a variable-length object: + +@smallexample +struct line @{ + int length; + char contents[0]; +@}; + +struct line *thisline = (struct line *) + malloc (sizeof (struct line) + this_length); +thisline->length = this_length; +@end smallexample + +Although the size of a zero-length array is zero, an array member of +this kind may increase the size of the enclosing type as a result of tail +padding. The offset of a zero-length array member from the beginning +of the enclosing structure is the same as the offset of an array with +one or more elements of the same type. The alignment of a zero-length +array is the same as the alignment of its elements. + +Declaring zero-length arrays in other contexts, including as interior +members of structure objects or as non-member objects, is discouraged. +Accessing elements of zero-length arrays declared in such contexts is +undefined and may be diagnosed. + +In the absence of the zero-length array extension, in ISO C90 +the @code{contents} array in the example above would typically be declared +to have a single element. Unlike a zero-length array which only contributes +to the size of the enclosing structure for the purposes of alignment, +a one-element array always occupies at least as much space as a single +object of the type. Although using one-element arrays this way is +discouraged, GCC handles accesses to trailing one-element array members +analogously to zero-length arrays. + +The preferred mechanism to declare variable-length types like +@code{struct line} above is the ISO C99 @dfn{flexible array member}, +with slightly different syntax and semantics: + +@itemize @bullet +@item +Flexible array members are written as @code{contents[]} without +the @code{0}. + +@item +Flexible array members have incomplete type, and so the @code{sizeof} +operator may not be applied. As a quirk of the original implementation +of zero-length arrays, @code{sizeof} evaluates to zero. + +@item +Flexible array members may only appear as the last member of a +@code{struct} that is otherwise non-empty. + +@item +A structure containing a flexible array member, or a union containing +such a structure (possibly recursively), may not be a member of a +structure or an element of an array. (However, these uses are +permitted by GCC as extensions.) +@end itemize + +Non-empty initialization of zero-length +arrays is treated like any case where there are more initializer +elements than the array holds, in that a suitable warning about ``excess +elements in array'' is given, and the excess elements (all of them, in +this case) are ignored. + +GCC allows static initialization of flexible array members. +This is equivalent to defining a new structure containing the original +structure followed by an array of sufficient size to contain the data. +E.g.@: in the following, @code{f1} is constructed as if it were declared +like @code{f2}. + +@smallexample +struct f1 @{ + int x; int y[]; +@} f1 = @{ 1, @{ 2, 3, 4 @} @}; + +struct f2 @{ + struct f1 f1; int data[3]; +@} f2 = @{ @{ 1 @}, @{ 2, 3, 4 @} @}; +@end smallexample + +@noindent +The convenience of this extension is that @code{f1} has the desired +type, eliminating the need to consistently refer to @code{f2.f1}. + +This has symmetry with normal static arrays, in that an array of +unknown size is also written with @code{[]}. + +Of course, this extension only makes sense if the extra data comes at +the end of a top-level object, as otherwise we would be overwriting +data at subsequent offsets. To avoid undue complication and confusion +with initialization of deeply nested arrays, we simply disallow any +non-empty initialization except when the structure is the top-level +object. For example: + +@smallexample +struct foo @{ int x; int y[]; @}; +struct bar @{ struct foo z; @}; + +struct foo a = @{ 1, @{ 2, 3, 4 @} @}; // @r{Valid.} +struct bar b = @{ @{ 1, @{ 2, 3, 4 @} @} @}; // @r{Invalid.} +struct bar c = @{ @{ 1, @{ @} @} @}; // @r{Valid.} +struct foo d[1] = @{ @{ 1, @{ 2, 3, 4 @} @} @}; // @r{Invalid.} +@end smallexample + +@node Empty Structures +@section Structures with No Members +@cindex empty structures +@cindex zero-size structures + +GCC permits a C structure to have no members: + +@smallexample +struct empty @{ +@}; +@end smallexample + +The structure has size zero. In C++, empty structures are part +of the language. G++ treats empty structures as if they had a single +member of type @code{char}. + +@node Variable Length +@section Arrays of Variable Length +@cindex variable-length arrays +@cindex arrays of variable length +@cindex VLAs + +Variable-length automatic arrays are allowed in ISO C99, and as an +extension GCC accepts them in C90 mode and in C++. These arrays are +declared like any other automatic arrays, but with a length that is not +a constant expression. The storage is allocated at the point of +declaration and deallocated when the block scope containing the declaration +exits. For +example: + +@smallexample +FILE * +concat_fopen (char *s1, char *s2, char *mode) +@{ + char str[strlen (s1) + strlen (s2) + 1]; + strcpy (str, s1); + strcat (str, s2); + return fopen (str, mode); +@} +@end smallexample + +@cindex scope of a variable length array +@cindex variable-length array scope +@cindex deallocating variable length arrays +Jumping or breaking out of the scope of the array name deallocates the +storage. Jumping into the scope is not allowed; you get an error +message for it. + +@cindex variable-length array in a structure +As an extension, GCC accepts variable-length arrays as a member of +a structure or a union. For example: + +@smallexample +void +foo (int n) +@{ + struct S @{ int x[n]; @}; +@} +@end smallexample + +@cindex @code{alloca} vs variable-length arrays +You can use the function @code{alloca} to get an effect much like +variable-length arrays. The function @code{alloca} is available in +many other C implementations (but not in all). On the other hand, +variable-length arrays are more elegant. + +There are other differences between these two methods. Space allocated +with @code{alloca} exists until the containing @emph{function} returns. +The space for a variable-length array is deallocated as soon as the array +name's scope ends, unless you also use @code{alloca} in this scope. + +You can also use variable-length arrays as arguments to functions: + +@smallexample +struct entry +tester (int len, char data[len][len]) +@{ + /* @r{@dots{}} */ +@} +@end smallexample + +The length of an array is computed once when the storage is allocated +and is remembered for the scope of the array in case you access it with +@code{sizeof}. + +If you want to pass the array first and the length afterward, you can +use a forward declaration in the parameter list---another GNU extension. + +@smallexample +struct entry +tester (int len; char data[len][len], int len) +@{ + /* @r{@dots{}} */ +@} +@end smallexample + +@cindex parameter forward declaration +The @samp{int len} before the semicolon is a @dfn{parameter forward +declaration}, and it serves the purpose of making the name @code{len} +known when the declaration of @code{data} is parsed. + +You can write any number of such parameter forward declarations in the +parameter list. They can be separated by commas or semicolons, but the +last one must end with a semicolon, which is followed by the ``real'' +parameter declarations. Each forward declaration must match a ``real'' +declaration in parameter name and data type. ISO C99 does not support +parameter forward declarations. + +@node Variadic Macros +@section Macros with a Variable Number of Arguments. +@cindex variable number of arguments +@cindex macro with variable arguments +@cindex rest argument (in macro) +@cindex variadic macros + +In the ISO C standard of 1999, a macro can be declared to accept a +variable number of arguments much as a function can. The syntax for +defining the macro is similar to that of a function. Here is an +example: + +@smallexample +#define debug(format, ...) fprintf (stderr, format, __VA_ARGS__) +@end smallexample + +@noindent +Here @samp{@dots{}} is a @dfn{variable argument}. In the invocation of +such a macro, it represents the zero or more tokens until the closing +parenthesis that ends the invocation, including any commas. This set of +tokens replaces the identifier @code{__VA_ARGS__} in the macro body +wherever it appears. See the CPP manual for more information. + +GCC has long supported variadic macros, and used a different syntax that +allowed you to give a name to the variable arguments just like any other +argument. Here is an example: + +@smallexample +#define debug(format, args...) fprintf (stderr, format, args) +@end smallexample + +@noindent +This is in all ways equivalent to the ISO C example above, but arguably +more readable and descriptive. + +GNU CPP has two further variadic macro extensions, and permits them to +be used with either of the above forms of macro definition. + +In standard C, you are not allowed to leave the variable argument out +entirely; but you are allowed to pass an empty argument. For example, +this invocation is invalid in ISO C, because there is no comma after +the string: + +@smallexample +debug ("A message") +@end smallexample + +GNU CPP permits you to completely omit the variable arguments in this +way. In the above examples, the compiler would complain, though since +the expansion of the macro still has the extra comma after the format +string. + +To help solve this problem, CPP behaves specially for variable arguments +used with the token paste operator, @samp{##}. If instead you write + +@smallexample +#define debug(format, ...) fprintf (stderr, format, ## __VA_ARGS__) +@end smallexample + +@noindent +and if the variable arguments are omitted or empty, the @samp{##} +operator causes the preprocessor to remove the comma before it. If you +do provide some variable arguments in your macro invocation, GNU CPP +does not complain about the paste operation and instead places the +variable arguments after the comma. Just like any other pasted macro +argument, these arguments are not macro expanded. + +@node Escaped Newlines +@section Slightly Looser Rules for Escaped Newlines +@cindex escaped newlines +@cindex newlines (escaped) + +The preprocessor treatment of escaped newlines is more relaxed +than that specified by the C90 standard, which requires the newline +to immediately follow a backslash. +GCC's implementation allows whitespace in the form +of spaces, horizontal and vertical tabs, and form feeds between the +backslash and the subsequent newline. The preprocessor issues a +warning, but treats it as a valid escaped newline and combines the two +lines to form a single logical line. This works within comments and +tokens, as well as between tokens. Comments are @emph{not} treated as +whitespace for the purposes of this relaxation, since they have not +yet been replaced with spaces. + +@node Subscripting +@section Non-Lvalue Arrays May Have Subscripts +@cindex subscripting +@cindex arrays, non-lvalue + +@cindex subscripting and function values +In ISO C99, arrays that are not lvalues still decay to pointers, and +may be subscripted, although they may not be modified or used after +the next sequence point and the unary @samp{&} operator may not be +applied to them. As an extension, GNU C allows such arrays to be +subscripted in C90 mode, though otherwise they do not decay to +pointers outside C99 mode. For example, +this is valid in GNU C though not valid in C90: + +@smallexample +@group +struct foo @{int a[4];@}; + +struct foo f(); + +bar (int index) +@{ + return f().a[index]; +@} +@end group +@end smallexample + +@node Pointer Arith +@section Arithmetic on @code{void}- and Function-Pointers +@cindex void pointers, arithmetic +@cindex void, size of pointer to +@cindex function pointers, arithmetic +@cindex function, size of pointer to + +In GNU C, addition and subtraction operations are supported on pointers to +@code{void} and on pointers to functions. This is done by treating the +size of a @code{void} or of a function as 1. + +A consequence of this is that @code{sizeof} is also allowed on @code{void} +and on function types, and returns 1. + +@opindex Wpointer-arith +The option @option{-Wpointer-arith} requests a warning if these extensions +are used. + +@node Variadic Pointer Args +@section Pointer Arguments in Variadic Functions +@cindex pointer arguments in variadic functions +@cindex variadic functions, pointer arguments + +Standard C requires that pointer types used with @code{va_arg} in +functions with variable argument lists either must be compatible with +that of the actual argument, or that one type must be a pointer to +@code{void} and the other a pointer to a character type. GNU C +implements the POSIX XSI extension that additionally permits the use +of @code{va_arg} with a pointer type to receive arguments of any other +pointer type. + +In particular, in GNU C @samp{va_arg (ap, void *)} can safely be used +to consume an argument of any pointer type. + +@node Pointers to Arrays +@section Pointers to Arrays with Qualifiers Work as Expected +@cindex pointers to arrays +@cindex const qualifier + +In GNU C, pointers to arrays with qualifiers work similar to pointers +to other qualified types. For example, a value of type @code{int (*)[5]} +can be used to initialize a variable of type @code{const int (*)[5]}. +These types are incompatible in ISO C because the @code{const} qualifier +is formally attached to the element type of the array and not the +array itself. + +@smallexample +extern void +transpose (int N, int M, double out[M][N], const double in[N][M]); +double x[3][2]; +double y[2][3]; +@r{@dots{}} +transpose(3, 2, y, x); +@end smallexample + +@node Initializers +@section Non-Constant Initializers +@cindex initializers, non-constant +@cindex non-constant initializers + +As in standard C++ and ISO C99, the elements of an aggregate initializer for an +automatic variable are not required to be constant expressions in GNU C@. +Here is an example of an initializer with run-time varying elements: + +@smallexample +foo (float f, float g) +@{ + float beat_freqs[2] = @{ f-g, f+g @}; + /* @r{@dots{}} */ +@} +@end smallexample + +@node Compound Literals +@section Compound Literals +@cindex constructor expressions +@cindex initializations in expressions +@cindex structures, constructor expression +@cindex expressions, constructor +@cindex compound literals +@c The GNU C name for what C99 calls compound literals was "constructor expressions". + +A compound literal looks like a cast of a brace-enclosed aggregate +initializer list. Its value is an object of the type specified in +the cast, containing the elements specified in the initializer. +Unlike the result of a cast, a compound literal is an lvalue. ISO +C99 and later support compound literals. As an extension, GCC +supports compound literals also in C90 mode and in C++, although +as explained below, the C++ semantics are somewhat different. + +Usually, the specified type of a compound literal is a structure. Assume +that @code{struct foo} and @code{structure} are declared as shown: + +@smallexample +struct foo @{int a; char b[2];@} structure; +@end smallexample + +@noindent +Here is an example of constructing a @code{struct foo} with a compound literal: + +@smallexample +structure = ((struct foo) @{x + y, 'a', 0@}); +@end smallexample + +@noindent +This is equivalent to writing the following: + +@smallexample +@{ + struct foo temp = @{x + y, 'a', 0@}; + structure = temp; +@} +@end smallexample + +You can also construct an array, though this is dangerous in C++, as +explained below. If all the elements of the compound literal are +(made up of) simple constant expressions suitable for use in +initializers of objects of static storage duration, then the compound +literal can be coerced to a pointer to its first element and used in +such an initializer, as shown here: + +@smallexample +char **foo = (char *[]) @{ "x", "y", "z" @}; +@end smallexample + +Compound literals for scalar types and union types are also allowed. In +the following example the variable @code{i} is initialized to the value +@code{2}, the result of incrementing the unnamed object created by +the compound literal. + +@smallexample +int i = ++(int) @{ 1 @}; +@end smallexample + +As a GNU extension, GCC allows initialization of objects with static storage +duration by compound literals (which is not possible in ISO C99 because +the initializer is not a constant). +It is handled as if the object were initialized only with the brace-enclosed +list if the types of the compound literal and the object match. +The elements of the compound literal must be constant. +If the object being initialized has array type of unknown size, the size is +determined by the size of the compound literal. + +@smallexample +static struct foo x = (struct foo) @{1, 'a', 'b'@}; +static int y[] = (int []) @{1, 2, 3@}; +static int z[] = (int [3]) @{1@}; +@end smallexample + +@noindent +The above lines are equivalent to the following: +@smallexample +static struct foo x = @{1, 'a', 'b'@}; +static int y[] = @{1, 2, 3@}; +static int z[] = @{1, 0, 0@}; +@end smallexample + +In C, a compound literal designates an unnamed object with static or +automatic storage duration. In C++, a compound literal designates a +temporary object that only lives until the end of its full-expression. +As a result, well-defined C code that takes the address of a subobject +of a compound literal can be undefined in C++, so G++ rejects +the conversion of a temporary array to a pointer. For instance, if +the array compound literal example above appeared inside a function, +any subsequent use of @code{foo} in C++ would have undefined behavior +because the lifetime of the array ends after the declaration of @code{foo}. + +As an optimization, G++ sometimes gives array compound literals longer +lifetimes: when the array either appears outside a function or has +a @code{const}-qualified type. If @code{foo} and its initializer had +elements of type @code{char *const} rather than @code{char *}, or if +@code{foo} were a global variable, the array would have static storage +duration. But it is probably safest just to avoid the use of array +compound literals in C++ code. + +@node Designated Inits +@section Designated Initializers +@cindex initializers with labeled elements +@cindex labeled elements in initializers +@cindex case labels in initializers +@cindex designated initializers + +Standard C90 requires the elements of an initializer to appear in a fixed +order, the same as the order of the elements in the array or structure +being initialized. + +In ISO C99 you can give the elements in any order, specifying the array +indices or structure field names they apply to, and GNU C allows this as +an extension in C90 mode as well. This extension is not +implemented in GNU C++. + +To specify an array index, write +@samp{[@var{index}] =} before the element value. For example, + +@smallexample +int a[6] = @{ [4] = 29, [2] = 15 @}; +@end smallexample + +@noindent +is equivalent to + +@smallexample +int a[6] = @{ 0, 0, 15, 0, 29, 0 @}; +@end smallexample + +@noindent +The index values must be constant expressions, even if the array being +initialized is automatic. + +An alternative syntax for this that has been obsolete since GCC 2.5 but +GCC still accepts is to write @samp{[@var{index}]} before the element +value, with no @samp{=}. + +To initialize a range of elements to the same value, write +@samp{[@var{first} ... @var{last}] = @var{value}}. This is a GNU +extension. For example, + +@smallexample +int widths[] = @{ [0 ... 9] = 1, [10 ... 99] = 2, [100] = 3 @}; +@end smallexample + +@noindent +If the value in it has side effects, the side effects happen only once, +not for each initialized field by the range initializer. + +@noindent +Note that the length of the array is the highest value specified +plus one. + +In a structure initializer, specify the name of a field to initialize +with @samp{.@var{fieldname} =} before the element value. For example, +given the following structure, + +@smallexample +struct point @{ int x, y; @}; +@end smallexample + +@noindent +the following initialization + +@smallexample +struct point p = @{ .y = yvalue, .x = xvalue @}; +@end smallexample + +@noindent +is equivalent to + +@smallexample +struct point p = @{ xvalue, yvalue @}; +@end smallexample + +Another syntax that has the same meaning, obsolete since GCC 2.5, is +@samp{@var{fieldname}:}, as shown here: + +@smallexample +struct point p = @{ y: yvalue, x: xvalue @}; +@end smallexample + +Omitted fields are implicitly initialized the same as for objects +that have static storage duration. + +@cindex designators +The @samp{[@var{index}]} or @samp{.@var{fieldname}} is known as a +@dfn{designator}. You can also use a designator (or the obsolete colon +syntax) when initializing a union, to specify which element of the union +should be used. For example, + +@smallexample +union foo @{ int i; double d; @}; + +union foo f = @{ .d = 4 @}; +@end smallexample + +@noindent +converts 4 to a @code{double} to store it in the union using +the second element. By contrast, casting 4 to type @code{union foo} +stores it into the union as the integer @code{i}, since it is +an integer. @xref{Cast to Union}. + +You can combine this technique of naming elements with ordinary C +initialization of successive elements. Each initializer element that +does not have a designator applies to the next consecutive element of the +array or structure. For example, + +@smallexample +int a[6] = @{ [1] = v1, v2, [4] = v4 @}; +@end smallexample + +@noindent +is equivalent to + +@smallexample +int a[6] = @{ 0, v1, v2, 0, v4, 0 @}; +@end smallexample + +Labeling the elements of an array initializer is especially useful +when the indices are characters or belong to an @code{enum} type. +For example: + +@smallexample +int whitespace[256] + = @{ [' '] = 1, ['\t'] = 1, ['\h'] = 1, + ['\f'] = 1, ['\n'] = 1, ['\r'] = 1 @}; +@end smallexample + +@cindex designator lists +You can also write a series of @samp{.@var{fieldname}} and +@samp{[@var{index}]} designators before an @samp{=} to specify a +nested subobject to initialize; the list is taken relative to the +subobject corresponding to the closest surrounding brace pair. For +example, with the @samp{struct point} declaration above: + +@smallexample +struct point ptarray[10] = @{ [2].y = yv2, [2].x = xv2, [0].x = xv0 @}; +@end smallexample + +If the same field is initialized multiple times, or overlapping +fields of a union are initialized, the value from the last +initialization is used. When a field of a union is itself a structure, +the entire structure from the last field initialized is used. If any previous +initializer has side effect, it is unspecified whether the side effect +happens or not. Currently, GCC discards the side-effecting +initializer expressions and issues a warning. + +@node Case Ranges +@section Case Ranges +@cindex case ranges +@cindex ranges in case statements + +You can specify a range of consecutive values in a single @code{case} label, +like this: + +@smallexample +case @var{low} ... @var{high}: +@end smallexample + +@noindent +This has the same effect as the proper number of individual @code{case} +labels, one for each integer value from @var{low} to @var{high}, inclusive. + +This feature is especially useful for ranges of ASCII character codes: + +@smallexample +case 'A' ... 'Z': +@end smallexample + +@strong{Be careful:} Write spaces around the @code{...}, for otherwise +it may be parsed wrong when you use it with integer values. For example, +write this: + +@smallexample +case 1 ... 5: +@end smallexample + +@noindent +rather than this: + +@smallexample +case 1...5: +@end smallexample + +@node Cast to Union +@section Cast to a Union Type +@cindex cast to a union +@cindex union, casting to a + +A cast to a union type is a C extension not available in C++. It looks +just like ordinary casts with the constraint that the type specified is +a union type. You can specify the type either with the @code{union} +keyword or with a @code{typedef} name that refers to a union. The result +of a cast to a union is a temporary rvalue of the union type with a member +whose type matches that of the operand initialized to the value of +the operand. The effect of a cast to a union is similar to a compound +literal except that it yields an rvalue like standard casts do. +@xref{Compound Literals}. + +Expressions that may be cast to the union type are those whose type matches +at least one of the members of the union. Thus, given the following union +and variables: + +@smallexample +union foo @{ int i; double d; @}; +int x; +double y; +union foo z; +@end smallexample + +@noindent +both @code{x} and @code{y} can be cast to type @code{union foo} and +the following assignments +@smallexample + z = (union foo) x; + z = (union foo) y; +@end smallexample +are shorthand equivalents of these +@smallexample + z = (union foo) @{ .i = x @}; + z = (union foo) @{ .d = y @}; +@end smallexample + +However, @code{(union foo) FLT_MAX;} is not a valid cast because the union +has no member of type @code{float}. + +Using the cast as the right-hand side of an assignment to a variable of +union type is equivalent to storing in a member of the union with +the same type + +@smallexample +union foo u; +/* @r{@dots{}} */ +u = (union foo) x @equiv{} u.i = x +u = (union foo) y @equiv{} u.d = y +@end smallexample + +You can also use the union cast as a function argument: + +@smallexample +void hack (union foo); +/* @r{@dots{}} */ +hack ((union foo) x); +@end smallexample + +@node Mixed Labels and Declarations +@section Mixed Declarations, Labels and Code +@cindex mixed declarations and code +@cindex declarations, mixed with code +@cindex code, mixed with declarations + +ISO C99 and ISO C++ allow declarations and code to be freely mixed +within compound statements. ISO C2X allows labels to be +placed before declarations and at the end of a compound statement. +As an extension, GNU C also allows all this in C90 mode. For example, +you could do: + +@smallexample +int i; +/* @r{@dots{}} */ +i++; +int j = i + 2; +@end smallexample + +Each identifier is visible from where it is declared until the end of +the enclosing block. + +@node Function Attributes +@section Declaring Attributes of Functions +@cindex function attributes +@cindex declaring attributes of functions +@cindex @code{volatile} applied to function +@cindex @code{const} applied to function + +In GNU C and C++, you can use function attributes to specify certain +function properties that may help the compiler optimize calls or +check code more carefully for correctness. For example, you +can use attributes to specify that a function never returns +(@code{noreturn}), returns a value depending only on the values of +its arguments (@code{const}), or has @code{printf}-style arguments +(@code{format}). + +You can also use attributes to control memory placement, code +generation options or call/return conventions within the function +being annotated. Many of these attributes are target-specific. For +example, many targets support attributes for defining interrupt +handler functions, which typically must follow special register usage +and return conventions. Such attributes are described in the subsection +for each target. However, a considerable number of attributes are +supported by most, if not all targets. Those are described in +the @ref{Common Function Attributes} section. + +Function attributes are introduced by the @code{__attribute__} keyword +in the declaration of a function, followed by an attribute specification +enclosed in double parentheses. You can specify multiple attributes in +a declaration by separating them by commas within the double parentheses +or by immediately following one attribute specification with another. +@xref{Attribute Syntax}, for the exact rules on attribute syntax and +placement. Compatible attribute specifications on distinct declarations +of the same function are merged. An attribute specification that is not +compatible with attributes already applied to a declaration of the same +function is ignored with a warning. + +Some function attributes take one or more arguments that refer to +the function's parameters by their positions within the function parameter +list. Such attribute arguments are referred to as @dfn{positional arguments}. +Unless specified otherwise, positional arguments that specify properties +of parameters with pointer types can also specify the same properties of +the implicit C++ @code{this} argument in non-static member functions, and +of parameters of reference to a pointer type. For ordinary functions, +position one refers to the first parameter on the list. In C++ non-static +member functions, position one refers to the implicit @code{this} pointer. +The same restrictions and effects apply to function attributes used with +ordinary functions or C++ member functions. + +GCC also supports attributes on +variable declarations (@pxref{Variable Attributes}), +labels (@pxref{Label Attributes}), +enumerators (@pxref{Enumerator Attributes}), +statements (@pxref{Statement Attributes}), +types (@pxref{Type Attributes}), +and on field declarations (for @code{tainted_args}). + +There is some overlap between the purposes of attributes and pragmas +(@pxref{Pragmas,,Pragmas Accepted by GCC}). It has been +found convenient to use @code{__attribute__} to achieve a natural +attachment of attributes to their corresponding declarations, whereas +@code{#pragma} is of use for compatibility with other compilers +or constructs that do not naturally form part of the grammar. + +In addition to the attributes documented here, +GCC plugins may provide their own attributes. + +@menu +* Common Function Attributes:: +* AArch64 Function Attributes:: +* AMD GCN Function Attributes:: +* ARC Function Attributes:: +* ARM Function Attributes:: +* AVR Function Attributes:: +* Blackfin Function Attributes:: +* BPF Function Attributes:: +* C-SKY Function Attributes:: +* Epiphany Function Attributes:: +* H8/300 Function Attributes:: +* IA-64 Function Attributes:: +* M32C Function Attributes:: +* M32R/D Function Attributes:: +* m68k Function Attributes:: +* MCORE Function Attributes:: +* MeP Function Attributes:: +* MicroBlaze Function Attributes:: +* Microsoft Windows Function Attributes:: +* MIPS Function Attributes:: +* MSP430 Function Attributes:: +* NDS32 Function Attributes:: +* Nios II Function Attributes:: +* Nvidia PTX Function Attributes:: +* PowerPC Function Attributes:: +* RISC-V Function Attributes:: +* RL78 Function Attributes:: +* RX Function Attributes:: +* S/390 Function Attributes:: +* SH Function Attributes:: +* Symbian OS Function Attributes:: +* V850 Function Attributes:: +* Visium Function Attributes:: +* x86 Function Attributes:: +* Xstormy16 Function Attributes:: +@end menu + +@node Common Function Attributes +@subsection Common Function Attributes + +The following attributes are supported on most targets. + +@table @code +@c Keep this table alphabetized by attribute name. Treat _ as space. + +@item access (@var{access-mode}, @var{ref-index}) +@itemx access (@var{access-mode[...] [diff truncated at 524288 bytes]