[BOLT][BTI] Patch ignored functions in place when targeting them with indirect branches (#177165)
When applying BTI fixups to indirect branch targets, ignored functions
are
considered as a special case:
- these hold no instructions,
- have no CFG,
- and are not emitted in the new text section.
The solution is to patch the entry points in the original location.
If such a situation occurs in a binary, recompilation using the
-fpatchable-function-entry flag is required. This will place a nop at
all
function starts, which BOLT can use to patch the original section.
Without the extra nop, BOLT cannot safely patch the original .text
section.
[4 lines not shown]
[X86] getFauxShuffleMask - OR(SHUF(),SHUF()) - treat undemanded elements as undef (#182678)
We have to be careful when attempting to decode OR() patterns as
shuffles - we can't forward demanded undef elements in both sources as
an undef result as it can lead to infinite loops during widening
(#49393).
But if we don't demand the element in the first place (based off
demanded elts masks during recursive shuffle combines), then it doesn't
matter what the elements contain and we can treat it as a
SM_SentinelUndef shuffle element.
Noticed while working on #137422
[lldb][TypeSystemClang] Unconditionally set access control to AS_public (#182956)
This patch removes all our manual adjustments to the access control
specifiers of Clang decls we create from DWARF.
This has led to occasional subtle bugs in the past (the latest being
https://github.com/llvm/llvm-project/issues/171913) and it's ultimately
redundant because Clang already has provisions for LLDB to bypass access
control for C++ and Objective-C. Access control doesn't affect name
lookup so really we're doing a lot of bookkeeping for not much benefit.
The only "feature" that relied on this was that `type lookup <foo>`
would print the access specifier in the output structure layout. I'm not
convinced that's worth keeping the infrastructure in place for (but
happy to be convinced otherwise).
I'd rather lean fully into the Clang access control bypass instead.
Note, i still kept the `AccessType` parameters to the various
`TypeSystemClang` APIs to reduce the size of the diff. A follow-up NFC
change will remove those parameters and adjust all the call-sites.
[compiler-rt][ARM] Enable strict mode in divsf3/mulsf3 tests (#179918)
Commit 5efce7392f3f6cc added optimized AArch32 assembly versions of
mulsf3 and divsf3, with more thorough tests. The new tests included test
cases specific to Arm's particular NaN handling rules, which are
disabled on most platforms, but were intended to be enabled for Arm.
Unfortunately, they were not enabled under any circumstances, because I
made a mistake in `test/builtins/CMakeLists.txt`: the command-line `-D`
option that should have enabled them was added to the cflags list too
early, before the list was reinitialized from scratch. So it never ended
up on the command line.
Also, the test file mulsf3.S only even _tried_ to enable strict mode in
Thumb1, even though the Arm/Thumb2 implementation would also have met
its requirements.
Because the strict-mode tests weren't enabled, I didn't notice that they
would also have failed absolutely everything, because they checked the
[8 lines not shown]
[mlir][SPIRV] Add sub-element-byte lowering support for atomic_rmw ori/andi ops (#179831)
When the memref element type (e.g., i8) is narrower than the SPIR-V
storage type (e.g., i32 on Vulkan), ori and andi can be lowered with a
single wide atomic instruction because OR-with-0 and AND-with-1 are
identity operations.
The revision follows `IntStoreOpPattern` to compute offsets/sizes via
`adjustAccessChainForBitwidth` method and `getOffsetForBitwidth` method.
Additionally, it handles the returned value (which is the old value by
definition), which is different from `IntStoreOpPattern`. E.g., the
check of `spirv::Capability::Kernel` is the same.
https://github.com/llvm/llvm-project/blob/07ebb18e07fb9e009b1f738d6214a49c7bbe8fee/mlir/lib/Conversion/MemRefToSPIRV/MemRefToSPIRV.cpp#L847-L867
There are refactoring opportunities and it is not performed within the
revision because the current implementation is already complicated. The
refactoring can be happenned in a follow-up with its own patch, so
[6 lines not shown]
[AArch64] Match CTPOP combine without zero extend (#182859)
Helps improve: https://github.com/llvm/llvm-project/issues/182625.
This does not fully solve the issues with using `ctpop` as the vector
type chosen for the reduction is not ideal in all cases. This results in
extra extends, which can be seen in a few test cases.
[AMDGPU][GlobalISel] Add COPY_SCC_VCC combine for VCC-SGPR-VGPR pattern
Eliminate VCC->SGPR->VGPR bounce created by UniInVcc when the uniform boolean
result is consumed by a VALU instruction that requires the input in VGPRs.
Reapply "RuntimeLibcalls: Fix adding __safestack_pointer_address by default" (#182949) (#183005)
This reverts commit 6d37110e091569509f54e2b1f3ef35e8a50e5b70.
Now with aarch64 test.
Reapply "RuntimeLibcalls: Fix adding __safestack_pointer_address by default" (#182949)
This reverts commit 6d37110e091569509f54e2b1f3ef35e8a50e5b70.
Now with aarch64 test.
[Clang] Fix the normalization of fold constraints (#177531)
Fold constraints can contain packs expanded from different locations.
For `C<Ps...>`, where the ellipsis immediately follows the argument, the
pack should be expanded in place regardless of the fold expression. For
`C<Ps> && ...`, the fold expression itself is responsible for expanding
Ps.
Previously, both kinds of packs were expanded by the fold expression,
which broke assumptions within concept caching. This patch fixes that by
preserving PackExpansionTypes for the first kind of pack while rewriting
them to non-packs for the second kind.
This patch also removes an unused function and performs some cleanup of
the evaluation contexts. Hopefully it is viable for backporting.
No release note, as this issue was a regression.
Fixes https://github.com/llvm/llvm-project/issues/177245
[2 lines not shown]
[MC] Fix crash in x=0; .section x (#183001)
When an equated symbol (e.g. `x=0`) is followed by `.section x`,
getOrCreateSectionSymbol reports an "invalid symbol redefinition"
error but continues to reuse the equated symbol as a section symbol.
This causes an assertion failure in MCObjectStreamer::changeSection
when `setFragment` is called on the equated symbol.
Fix this by clearning `Sym`.