Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
GCC memory starvation caused by flatten attribute with LTO
Hello,
I've been testing GCC 4.9 for a virtual gentoo machine and I noticed that you us flatten attribute in source code. In case of src/sna/sna_glyphs.c flatten functions, inliner inlines about 3.3M functions and crashes because of no free memory (I have 8GB memory).
Please notice that LTO has ability to optimize whole program. As a result, it sees almost all function bodies and that leads to enormous inlining.
Suggested patch removes these flatten attributes for selected functions.
I have wanted for a long time to be able to use LTO on xf86-video-intel, so I was very pleased when I found this patch submission. I tried it on the latest development version of the Intel driver from git. It applied successfully, and I was able to successfully compile the Intel driver using LTO instead of experiencing the seeming infinite compile time otherwise resulting from LTO.
However, despite compilation being successful, my tests which involved glxgears, monitoring CPU usage, and watching videos on Youtube showed significantly poorer video performance with the LTO-compiled driver than without LTO. Though glxgears showed no discernible difference, Youtube performance was incredibly slow such that the audio portion of the video continued at normal speed while the video lagged progressively further behind in slow motion.
I have no way of knowing if LTO is directly responsible for the poor performance or if the patch somehow led to poor optimization by LTO, but this should be investigated further. I used GCC 4.9.3 for the test. My Linux distro currently doesn't offer the 5.x branch of GCC, so I was unable to test with GCC 5.3.
I have wanted for a long time to be able to use LTO on xf86-video-intel, so
I was very pleased when I found this patch submission. I tried it on the
latest development version of the Intel driver from git. It applied
successfully, and I was able to successfully compile the Intel driver using
LTO instead of experiencing the seeming infinite compile time otherwise
resulting from LTO.
However, despite compilation being successful, my tests which involved
glxgears, monitoring CPU usage, and watching videos on Youtube showed
significantly poorer video performance with the LTO-compiled driver than
without LTO. Though glxgears showed no discernible difference, Youtube
performance was incredibly slow such that the audio portion of the video
continued at normal speed while the video lagged progressively further
behind in slow motion.
I have no way of knowing if LTO is directly responsible for the poor
performance or if the patch somehow led to poor optimization by LTO, but
this should be investigated further. I used GCC 4.9.3 for the test. My Linux
distro currently doesn't offer the 5.x branch of GCC, so I was unable to
test with GCC 5.3.
Hi Patrick.
Well, it looks that xf86-video-intel driver needs flattened functions to produce optimal code. It would be interesting, if you rebuild the driver with the suggested patch applied (or is it part of mainline?) and try to generate perf report that can provide comparison between LTO and non-LTO build.
I know it's been a while since I followed up. First of all, I don't know how to generate a perf report, so that's partly why I didn't follow up sooner. Perhaps you can point me to a tutorial?
Also, in retrospect, I suppose there's a possibility that I may have compiled incorrectly. They -flto compiler flag, as well as the same compiler optimization flags used during compilation, must also be used during the linking phase. I use Funtoo, a variant of Gentoo, and I had -flto in my CFLAGS/CXXFLAGS but not in my LDFLAGS. This is handled by different programs makefiles. Some automatically add CXXFLAGS to LDFLAGS during the linking phase; some don't. I'm not sure about xf86-video-intel's linking phase. So it's possible that, in my earlier attempt, it wasn't linked and optimized properly. So I should probably revisit that.
In my earlier attempt, I was using GCC 4.9.3. Since that time, I've been using GCC 5.3.0 and, more recently, 6.1.0. GCC 5 and 6 fail the compilation phase before it even gets to the linking phase. Though that can be resolved by removing a few force_inline directives in the source code. In fact, I tried removing every instance of force_inline, and it not only allowed compilation to complete, it slightly reduced the memory used during linking. Alas linking still failed, apparently due to using too much memory.
Anyway, I'll give your patch another go just in case what I described above is what caused poorer performance with the LTO build. I really would like to get LTO working with this driver. LTO has really improved a lot in GCC with the 6.1.0 release.
I'm attaching a patch that takes a somewhat simpler approach to disabling "flatten" and "force_inline" which cause problems with LTO.
I used glxgears to measure FPS. The system was a Pentium 4 with an integrated 845G graphics chip, so it's pretty underpowered hardware, hence the low FPS. Without LTO, I got ~19.5 FPS running glxgears at fullscreen at 1680x1050 resolution. With LTO, I got about 22.5 FPS. That may not seem like much, but that's a 10% performance improvement! Except for the LTO-related flags, I used the same compiler and linker flags during compilation.
I can confirm that the attached patch helps to build the project with LTO enabled. I tried both 5.3.1 and latest trunk (7.0.0) and both work fine.
Can you please attach pre-processed source code for the issue:
/usr/include/bits/string3.h:50:1: error: inlining failed in call to always_inline ‘memcpy’: target specific option mismatch
Created attachment 125089 [details] [review] [review]
Patch to allow LTO optmization of xf86-video-intel
Nice work! This allows building with LTO on gcc-4.9. gcc-5.3.0 however fails
to build it, throwing a bunch of these around:
/usr/include/bits/string3.h:50:1: error: inlining failed in call to
always_inline ‘memcpy’: target specific option mismatch
Also, unlike your P4, I cannot detect any measurable performance
improvements (Intel Sandy Bridge). Unfortunate.
All tests done with xf86-video-intel from today's git.
Hmm... You shouldn't be getting that error. That's the error I was getting before I removed the "always_inline" definitions. If the patch worked properly for you, the compiler shouldn't be running into a "always_inline" directive.
I don't think I actually tested it with 5.3.0. I only verified that it builds with 4.9.3, and all my tests with performance involved 6.1.0.
I did do numerous tests involving various compiler flags such as -O2, -O3, graphite compiler flags, etc, and I found that I actually got reduced performance with -O3 compared to -O2, and I got reduced performance with using additional optimizations like graphite. The biggest improvement I got was with simple -O2 and a proper -march setting.
I can confirm that the attached patch helps to build the project with LTO
enabled. I tried both 5.3.1 and latest trunk (7.0.0) and both work fine.
Can you please attach pre-processed source code for the issue:
/usr/include/bits/string3.h:50:1: error: inlining failed in call to
always_inline ‘memcpy’: target specific option mismatch
and also command line arguments for GCC.
Thanks
Here are the command line arguments and the full error log.
In file included from /usr/include/features.h:365:0,
from /usr/include/stdint.h:25,
from /usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/stdint.h:9,
from /var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/sna.h:40,
from /var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/blt.c:32:
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/blt.c: In function ‘to_memcpy’:
/usr/include/bits/string3.h:50:1: error: inlining failed in call to always_inline ‘memcpy’: target specific option mismatch
__NTH (memcpy (void *__restrict __dest, const void *__restrict __src,
^
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/blt.c:490:4: error: called from here
memcpy(dst, src, len);
^
In file included from /usr/include/features.h:365:0,
from /usr/include/stdint.h:25,
from /usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/stdint.h:9,
from /var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/sna.h:40,
from /var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/blt.c:32:
/usr/include/bits/string3.h:50:1: error: inlining failed in call to always_inline ‘memcpy’: target specific option mismatch
__NTH (memcpy (void *__restrict __dest, const void *__restrict __src,
^
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/blt.c:555:2: error: called from here
memcpy(dst, src, len & 3);
^
In file included from /usr/include/features.h:365:0,
from /usr/include/stdint.h:25,
from /usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/stdint.h:9,
from /var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/sna.h:40,
from /var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/blt.c:32:
/usr/include/bits/string3.h:50:1: error: inlining failed in call to always_inline ‘memcpy’: target specific option mismatch
__NTH (memcpy (void *__restrict __dest, const void *__restrict __src,
^
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/blt.c:490:4: error: called from here
memcpy(dst, src, len);
^
In file included from /usr/include/features.h:365:0,
from /usr/include/stdint.h:25,
from /usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/stdint.h:9,
from /var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/sna.h:40,
from /var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/blt.c:32:
/usr/include/bits/string3.h:50:1: error: inlining failed in call to always_inline ‘memcpy’: target specific option mismatch
__NTH (memcpy (void *__restrict __dest, const void *__restrict __src,
^
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/blt.c:555:2: error: called from here
memcpy(dst, src, len & 3);
^
Makefile:655: recipe for target 'blt.lo' failed
-----------------------------------
Attached is what gcc generates with -E, I think that's what you needed, correct?
Created attachment 125281 [details]
pre-processed sources
(In reply to Martin Liška from comment 7)
I can confirm that the attached patch helps to build the project with LTO
enabled. I tried both 5.3.1 and latest trunk (7.0.0) and both work fine.
Can you please attach pre-processed source code for the issue:
/usr/include/bits/string3.h:50:1: error: inlining failed in call to
always_inline ‘memcpy’: target specific option mismatch
and also command line arguments for GCC.
Thanks
Here are the command line arguments and the full error log.
In file included from /usr/include/features.h:365:0,
from /usr/include/stdint.h:25,
from
/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/stdint.h:9,
from
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-
9999/src/sna/sna.h:40,
from
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-
9999/src/sna/blt.c:32:
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-
9999/src/sna/blt.c: In function ‘to_memcpy’:
/usr/include/bits/string3.h:50:1: error: inlining failed in call to
always_inline ‘memcpy’: target specific option mismatch
__NTH (memcpy (void *__restrict __dest, const void *__restrict __src,
^
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-
9999/src/sna/blt.c:490:4: error: called from here
memcpy(dst, src, len);
^
In file included from /usr/include/features.h:365:0,
from /usr/include/stdint.h:25,
from
/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/stdint.h:9,
from
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-
9999/src/sna/sna.h:40,
from
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-
9999/src/sna/blt.c:32:
/usr/include/bits/string3.h:50:1: error: inlining failed in call to
always_inline ‘memcpy’: target specific option mismatch
__NTH (memcpy (void *__restrict __dest, const void *__restrict __src,
^
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-
9999/src/sna/blt.c:555:2: error: called from here
memcpy(dst, src, len & 3);
^
In file included from /usr/include/features.h:365:0,
from /usr/include/stdint.h:25,
from
/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/stdint.h:9,
from
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-
9999/src/sna/sna.h:40,
from
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-
9999/src/sna/blt.c:32:
/usr/include/bits/string3.h:50:1: error: inlining failed in call to
always_inline ‘memcpy’: target specific option mismatch
__NTH (memcpy (void *__restrict __dest, const void *__restrict __src,
^
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-
9999/src/sna/blt.c:490:4: error: called from here
memcpy(dst, src, len);
^
In file included from /usr/include/features.h:365:0,
from /usr/include/stdint.h:25,
from
/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/stdint.h:9,
from
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-
9999/src/sna/sna.h:40,
from
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-
9999/src/sna/blt.c:32:
/usr/include/bits/string3.h:50:1: error: inlining failed in call to
always_inline ‘memcpy’: target specific option mismatch
__NTH (memcpy (void *__restrict __dest, const void *__restrict __src,
^
/var/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-
9999/src/sna/blt.c:555:2: error: called from here
memcpy(dst, src, len & 3);
^
Makefile:655: recipe for target 'blt.lo' failed
-----------------------------------
Attached is what gcc generates with -E, I think that's what you needed,
correct?