public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Unused GCC builtins
@ 2018-01-22 15:47 Manuel Rigger
  2018-01-22 15:55 ` David Brown
  2018-01-22 18:30 ` Florian Weimer
  0 siblings, 2 replies; 10+ messages in thread
From: Manuel Rigger @ 2018-01-22 15:47 UTC (permalink / raw)
  To: gcc; +Cc: s.marr, bram.adams

Hi everyone,

As part of my research, we have been analyzing the usage of GCC builtins
in 5,000 C GitHub projects. One of our findings is that many of these
builtins are unused, even though they are described in the documentation
(see https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html#C-Extensions)
and obviously took time to develop and maintain. I’ve uploaded a CSV
file with the unused builtins to
http://ssw.jku.at/General/Staff/ManuelRigger/unused-builtins.csv.

Details: We downloaded all C projects from GitHub that had more than 80
GitHub stars, which yielded almost 5,000 projects with a total of more
than one billion lines of C code. We filtered GCC, forks of GCC, and
other compilers as we did not want to incorporate internal usages of GCC
builtins or test cases. We extracted all builtin names from the GCC
docs, and also tried to find such names in the source code, which we
considered as builtin usages. We excluded subdirectories with GCC or
Clang, and removed other false positives. In total, we found 320k
builtin usages in these projects, and 3030 unused builtins out of a
total of 6039 builtins.

What is your take on this? Do you believe that some of these unused
builtins could be removed from the GCC docs or deprecated? Or are they
used in special "niche" domains that we did not consider? If yes, do you
think it is worth to maintain them? Are some of them only used in C++
projects? Might it be possible to remove their implementations (which
has already happened for the Cilk Plus builtins)?

We would be glad for any feedback.

- Manuel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Unused GCC builtins
  2018-01-22 15:47 Unused GCC builtins Manuel Rigger
@ 2018-01-22 15:55 ` David Brown
  2018-01-22 16:02   ` Joel Sherrill
                     ` (2 more replies)
  2018-01-22 18:30 ` Florian Weimer
  1 sibling, 3 replies; 10+ messages in thread
From: David Brown @ 2018-01-22 15:55 UTC (permalink / raw)
  To: Manuel Rigger, gcc; +Cc: s.marr, bram.adams

On 22/01/18 16:46, Manuel Rigger wrote:
> Hi everyone,
> 
> As part of my research, we have been analyzing the usage of GCC builtins
> in 5,000 C GitHub projects. One of our findings is that many of these
> builtins are unused, even though they are described in the documentation
> (see https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html#C-Extensions)
> and obviously took time to develop and maintain. I’ve uploaded a CSV
> file with the unused builtins to
> http://ssw.jku.at/General/Staff/ManuelRigger/unused-builtins.csv.
> 
> Details: We downloaded all C projects from GitHub that had more than 80
> GitHub stars, which yielded almost 5,000 projects with a total of more
> than one billion lines of C code. We filtered GCC, forks of GCC, and
> other compilers as we did not want to incorporate internal usages of GCC
> builtins or test cases. We extracted all builtin names from the GCC
> docs, and also tried to find such names in the source code, which we
> considered as builtin usages. We excluded subdirectories with GCC or
> Clang, and removed other false positives. In total, we found 320k
> builtin usages in these projects, and 3030 unused builtins out of a
> total of 6039 builtins.
> 
> What is your take on this? Do you believe that some of these unused
> builtins could be removed from the GCC docs or deprecated? Or are they
> used in special "niche" domains that we did not consider? If yes, do you
> think it is worth to maintain them? Are some of them only used in C++
> projects? Might it be possible to remove their implementations (which
> has already happened for the Cilk Plus builtins)?
> 
> We would be glad for any feedback.
> 
> - Manuel
> 

Many of these are going to be used automatically by the compiler.  You
write "strdup" in your code, and the compiler treats it as
"__builtin_strdup".  I don't know that such functions need to be
documented as extensions, but they are certainly in use.

You will also find that a large number of the builtins are for specific
target processors, and projects using them are not going to turn up on
GitHub.  They will be used in embedded software that is not open source.

I am sure there are builtins that are rarely or never used - but I doubt
if it is anything like as many as you have identified from this survey.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Unused GCC builtins
  2018-01-22 15:55 ` David Brown
@ 2018-01-22 16:02   ` Joel Sherrill
  2018-01-22 16:07   ` Jakub Jelinek
  2018-01-22 16:10   ` Andrew Pinski
  2 siblings, 0 replies; 10+ messages in thread
From: Joel Sherrill @ 2018-01-22 16:02 UTC (permalink / raw)
  To: David Brown, Manuel Rigger, gcc; +Cc: s.marr, bram.adams



On 1/22/2018 9:55 AM, David Brown wrote:
> On 22/01/18 16:46, Manuel Rigger wrote:
>> Hi everyone,
>>
>> As part of my research, we have been analyzing the usage of GCC builtins
>> in 5,000 C GitHub projects. One of our findings is that many of these
>> builtins are unused, even though they are described in the documentation
>> (see https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html#C-Extensions)
>> and obviously took time to develop and maintain. I’ve uploaded a CSV
>> file with the unused builtins to
>> http://ssw.jku.at/General/Staff/ManuelRigger/unused-builtins.csv.
>>
>> Details: We downloaded all C projects from GitHub that had more than 80
>> GitHub stars, which yielded almost 5,000 projects with a total of more
>> than one billion lines of C code. We filtered GCC, forks of GCC, and
>> other compilers as we did not want to incorporate internal usages of GCC
>> builtins or test cases. We extracted all builtin names from the GCC
>> docs, and also tried to find such names in the source code, which we
>> considered as builtin usages. We excluded subdirectories with GCC or
>> Clang, and removed other false positives. In total, we found 320k
>> builtin usages in these projects, and 3030 unused builtins out of a
>> total of 6039 builtins.
>>
>> What is your take on this? Do you believe that some of these unused
>> builtins could be removed from the GCC docs or deprecated? Or are they
>> used in special "niche" domains that we did not consider? If yes, do you
>> think it is worth to maintain them? Are some of them only used in C++
>> projects? Might it be possible to remove their implementations (which
>> has already happened for the Cilk Plus builtins)?
>>
>> We would be glad for any feedback.
>>
>> - Manuel
>>
> 
> Many of these are going to be used automatically by the compiler.  You
> write "strdup" in your code, and the compiler treats it as
> "__builtin_strdup".  I don't know that such functions need to be
> documented as extensions, but they are certainly in use.
> 
> You will also find that a large number of the builtins are for specific
> target processors, and projects using them are not going to turn up on
> GitHub.  They will be used in embedded software that is not open source.
> 
> I am sure there are builtins that are rarely or never used - but I doubt
> if it is anything like as many as you have identified from this survey.
> 

My first thought was that there is a lot of free and open source 
software that is not hosted at github. Larger projects are often 
self-hosted. Does this list cover all GNU, Savannah, sourceware.org, 
Apache, KDE, *BSD, Mozilla, etc projects?

You might get lucky and some like RTEMS and FreeBSD (I think) have
a github mirror. But github is not the entire universe of free and
open source software.

--joel sherrill
RTEMS

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Unused GCC builtins
  2018-01-22 15:55 ` David Brown
  2018-01-22 16:02   ` Joel Sherrill
@ 2018-01-22 16:07   ` Jakub Jelinek
  2018-01-22 16:10   ` Andrew Pinski
  2 siblings, 0 replies; 10+ messages in thread
From: Jakub Jelinek @ 2018-01-22 16:07 UTC (permalink / raw)
  To: David Brown; +Cc: Manuel Rigger, gcc, s.marr, bram.adams

On Mon, Jan 22, 2018 at 04:55:42PM +0100, David Brown wrote:
> Many of these are going to be used automatically by the compiler.  You
> write "strdup" in your code, and the compiler treats it as
> "__builtin_strdup".  I don't know that such functions need to be
> documented as extensions, but they are certainly in use.
> 
> You will also find that a large number of the builtins are for specific
> target processors, and projects using them are not going to turn up on
> GitHub.  They will be used in embedded software that is not open source.

Not just that.  If the statistics e.g. ignored GCC headers, then obviously
it will miss most of the target builtins, because the normal and only
supported way for the target builtins is to use them through the intrinsic
inline functions or macros provided by those headers.
So, take those out (usually a vendor ABI is something that says what
intrinsics are provided, so even if you made statistics on what intrinsic is
used in the 5000 most popular projects, we still couldn't remove them) and
taking out the above category, where the builtins are just an alternative
for a standard function and depending on prototype and chosen standard some
functions are treated like builtins, pretty much nothing remains in your
survey.

	Jakub

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Unused GCC builtins
  2018-01-22 15:55 ` David Brown
  2018-01-22 16:02   ` Joel Sherrill
  2018-01-22 16:07   ` Jakub Jelinek
@ 2018-01-22 16:10   ` Andrew Pinski
  2 siblings, 0 replies; 10+ messages in thread
From: Andrew Pinski @ 2018-01-22 16:10 UTC (permalink / raw)
  To: David Brown; +Cc: Manuel Rigger, GCC Mailing List, s.marr, bram.adams

On Mon, Jan 22, 2018 at 7:55 AM, David Brown <david@westcontrol.com> wrote:
> On 22/01/18 16:46, Manuel Rigger wrote:
>> Hi everyone,
>>
>> As part of my research, we have been analyzing the usage of GCC builtins
>> in 5,000 C GitHub projects. One of our findings is that many of these
>> builtins are unused, even though they are described in the documentation
>> (see https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html#C-Extensions)
>> and obviously took time to develop and maintain. I’ve uploaded a CSV
>> file with the unused builtins to
>> http://ssw.jku.at/General/Staff/ManuelRigger/unused-builtins.csv.
>>
>> Details: We downloaded all C projects from GitHub that had more than 80
>> GitHub stars, which yielded almost 5,000 projects with a total of more
>> than one billion lines of C code. We filtered GCC, forks of GCC, and
>> other compilers as we did not want to incorporate internal usages of GCC
>> builtins or test cases. We extracted all builtin names from the GCC
>> docs, and also tried to find such names in the source code, which we
>> considered as builtin usages. We excluded subdirectories with GCC or
>> Clang, and removed other false positives. In total, we found 320k
>> builtin usages in these projects, and 3030 unused builtins out of a
>> total of 6039 builtins.
>>
>> What is your take on this? Do you believe that some of these unused
>> builtins could be removed from the GCC docs or deprecated? Or are they
>> used in special "niche" domains that we did not consider? If yes, do you
>> think it is worth to maintain them? Are some of them only used in C++
>> projects? Might it be possible to remove their implementations (which
>> has already happened for the Cilk Plus builtins)?
>>
>> We would be glad for any feedback.
>>
>> - Manuel
>>
>
> Many of these are going to be used automatically by the compiler.  You
> write "strdup" in your code, and the compiler treats it as
> "__builtin_strdup".  I don't know that such functions need to be
> documented as extensions, but they are certainly in use.
>
> You will also find that a large number of the builtins are for specific
> target processors, and projects using them are not going to turn up on
> GitHub.  They will be used in embedded software that is not open source.

And the many of the target ones are used indirectly via another
function/macro (e.g. __builtin_ia32_ptestc256).  The function/macro is
defined in a header that GCC  provides too.

Thanks,
Andrew

>
> I am sure there are builtins that are rarely or never used - but I doubt
> if it is anything like as many as you have identified from this survey.
>
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Unused GCC builtins
  2018-01-22 15:47 Unused GCC builtins Manuel Rigger
  2018-01-22 15:55 ` David Brown
@ 2018-01-22 18:30 ` Florian Weimer
  2018-01-24 14:05   ` Manuel Rigger
  1 sibling, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2018-01-22 18:30 UTC (permalink / raw)
  To: Manuel Rigger; +Cc: gcc, s.marr, bram.adams

* Manuel Rigger:

> Details: We downloaded all C projects from GitHub that had more than 80
> GitHub stars, which yielded almost 5,000 projects with a total of more
> than one billion lines of C code. We filtered GCC, forks of GCC, and
> other compilers as we did not want to incorporate internal usages of GCC
> builtins or test cases. We extracted all builtin names from the GCC
> docs, and also tried to find such names in the source code, which we
> considered as builtin usages.

You actually need to compile the sources with an instrumented compiler
to discover uses of built-ins.  Not all references will have verbatim,
textual references in source code, but their names are constructed
using preprocessor macros.  This happens for the majority of the
floating-point-related built-ins you listed, I think.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Unused GCC builtins
  2018-01-22 18:30 ` Florian Weimer
@ 2018-01-24 14:05   ` Manuel Rigger
  2018-01-24 14:10     ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Manuel Rigger @ 2018-01-24 14:05 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Manuel Rigger, gcc, Stefan Marr, Bram Adams

Thank you for all answers, which are very useful for us!

As you pointed out, we only considered GitHub projects. If I understood
correctly, builtins would still not be deprecated even if we considered all
other open-source hosting sites because closed-source projects could still
rely on them, right? Additionally, target-specific builtins could not be
deprecated or removed because of vendor ABIs.

Several of you noted that we did not consider internal builtins that are
used in the implementation of GCC headers or directly by the compiler. Also
the documentation mentions that GCC provides "a large number of built-in
functions other than the ones mentioned" for "internal use" which "are not
documented here because they may change from time to time" (see
https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/Other-Builtins.html#Other-
Builtins). We deliberately looked only at public builtins (and not internal
ones), as we are mainly interested in the effort needed to support GCC
builtins in other tools that process C code (e.g., other compilers or
analysis tools). We want to prevent that such tool developers need to
implement internal or unused builtins. So even if we cannot remove the
implementation of a builtin, removing it from the documentation could
already be a win.

In a second step, we also considered internal builtins and found that the
vararg handling builtins (__builtin_va_start, __builtin_va_end,
__builtin_va_arg, and __builtin_va_copy) are relied upon by many projects,
even though they are undocumented in GCC's builtins API. Could they be
added to the documentation?

Thanks,
Manuel

2018-01-22 19:29 GMT+01:00 Florian Weimer <fw@deneb.enyo.de>:

> * Manuel Rigger:
>
> > Details: We downloaded all C projects from GitHub that had more than 80
> > GitHub stars, which yielded almost 5,000 projects with a total of more
> > than one billion lines of C code. We filtered GCC, forks of GCC, and
> > other compilers as we did not want to incorporate internal usages of GCC
> > builtins or test cases. We extracted all builtin names from the GCC
> > docs, and also tried to find such names in the source code, which we
> > considered as builtin usages.
>
> You actually need to compile the sources with an instrumented compiler
> to discover uses of built-ins.  Not all references will have verbatim,
> textual references in source code, but their names are constructed
> using preprocessor macros.  This happens for the majority of the
> floating-point-related built-ins you listed, I think.
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Unused GCC builtins
  2018-01-24 14:05   ` Manuel Rigger
@ 2018-01-24 14:10     ` Jakub Jelinek
  2018-01-24 18:15       ` Florian Weimer
  2018-01-27 19:11       ` Martin Sebor
  0 siblings, 2 replies; 10+ messages in thread
From: Jakub Jelinek @ 2018-01-24 14:10 UTC (permalink / raw)
  To: Manuel Rigger; +Cc: Florian Weimer, Manuel Rigger, gcc, Stefan Marr, Bram Adams

On Wed, Jan 24, 2018 at 03:04:55PM +0100, Manuel Rigger wrote:
> In a second step, we also considered internal builtins and found that the
> vararg handling builtins (__builtin_va_start, __builtin_va_end,
> __builtin_va_arg, and __builtin_va_copy) are relied upon by many projects,
> even though they are undocumented in GCC's builtins API. Could they be
> added to the documentation?

Why?  What is documented is va_start/va_end/va_arg/va_copy, that is
what people should use, the builtins are just internal implementation of
those macros.

	Jakub

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Unused GCC builtins
  2018-01-24 14:10     ` Jakub Jelinek
@ 2018-01-24 18:15       ` Florian Weimer
  2018-01-27 19:11       ` Martin Sebor
  1 sibling, 0 replies; 10+ messages in thread
From: Florian Weimer @ 2018-01-24 18:15 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Manuel Rigger, Manuel Rigger, gcc, Stefan Marr, Bram Adams

* Jakub Jelinek:

> On Wed, Jan 24, 2018 at 03:04:55PM +0100, Manuel Rigger wrote:
>> In a second step, we also considered internal builtins and found that the
>> vararg handling builtins (__builtin_va_start, __builtin_va_end,
>> __builtin_va_arg, and __builtin_va_copy) are relied upon by many projects,
>> even though they are undocumented in GCC's builtins API. Could they be
>> added to the documentation?
>
> Why?  What is documented is va_start/va_end/va_arg/va_copy, that is
> what people should use, the builtins are just internal implementation of
> those macros.

And these builtins differ from the math builtins because <stdarg.h> is
provided by GCC, but <math.h> is not, and there are many different
implementations.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Unused GCC builtins
  2018-01-24 14:10     ` Jakub Jelinek
  2018-01-24 18:15       ` Florian Weimer
@ 2018-01-27 19:11       ` Martin Sebor
  1 sibling, 0 replies; 10+ messages in thread
From: Martin Sebor @ 2018-01-27 19:11 UTC (permalink / raw)
  To: Jakub Jelinek, Manuel Rigger
  Cc: Florian Weimer, Manuel Rigger, gcc, Stefan Marr, Bram Adams

On 01/24/2018 07:09 AM, Jakub Jelinek wrote:
> On Wed, Jan 24, 2018 at 03:04:55PM +0100, Manuel Rigger wrote:
>> In a second step, we also considered internal builtins and found that the
>> vararg handling builtins (__builtin_va_start, __builtin_va_end,
>> __builtin_va_arg, and __builtin_va_copy) are relied upon by many projects,
>> even though they are undocumented in GCC's builtins API. Could they be
>> added to the documentation?
>
> Why?  What is documented is va_start/va_end/va_arg/va_copy, that is
> what people should use, the builtins are just internal implementation of
> those macros.

There are a number of reasons why documenting visible APIs is
helpful whether or not they are meant to be used by end users.

Features that are not meant to be used should be documented
as such.  Mentioning that they are meant only for internal use
makes their purpose clear and sets the right expectation about
the level of support and portability between GCC versions.  It
also makes it clear that we didn't forget to document them by
accident.

The manual isn't just a reference for GCC users.  It's also
a helpful reference for developers of GCC-compatible compilers
who are not allowed to read GCC source code due to copyright or
licensing constraints, or for people maintaining or supporting
their own GCC-based operating environments.  Finally, it is also
a reference for GCC developers.

For all these reasons I think every built-in that can be used
(intentionally or otherwise) deserves to be documented in
the manual.

Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-01-27 19:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-22 15:47 Unused GCC builtins Manuel Rigger
2018-01-22 15:55 ` David Brown
2018-01-22 16:02   ` Joel Sherrill
2018-01-22 16:07   ` Jakub Jelinek
2018-01-22 16:10   ` Andrew Pinski
2018-01-22 18:30 ` Florian Weimer
2018-01-24 14:05   ` Manuel Rigger
2018-01-24 14:10     ` Jakub Jelinek
2018-01-24 18:15       ` Florian Weimer
2018-01-27 19:11       ` Martin Sebor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).