[clang-format][NFC] Don't always rebuild clang-format-check-format (#203828)
Instead, check the format of clan-format source only if the built
clang-format binary or one of the source files is newer.
[llvm] Replace unordered_set<std::string> with StringSet (#204048)
std::unordered_set<std::string> without a pointer-stability requirement
can use StringSet: it avoids per-TU hashtable instantiations and the
std::string temporary at StringRef lookup sites (~3-4 KiB smaller .text
for llc/opt).
[ELF] Support multiple PT_GNU_RELRO when SECTIONS is used without PHDRS (#203675)
When a SECTIONS command interleaves relro and non-relro sections, the
relro
region is split into discontiguous runs. lld emits an error since
https://reviews.llvm.org/D40359
error: section: <name> is not contiguous with other relro sections
This is overly strict: while glibc only honors the first PT_GNU_RELRO,
other loaders (e.g. Bionic and FreeBSD rtld-elf) protect every
PT_GNU_RELRO segment.
Emit one PT_GNU_RELRO segment for each contiguous run of relro sections.
Track the boundary section so that `createPhdrs` starts a fresh PT_LOAD
at each relro->non-relro transition, as before.
Consumers that don't expect multiple PT_GNU_RELRO should check the
output themselves.
[llvm] Replace unordered_set<T *> with SmallPtrSet<T *, 0> (#204051)
std::unordered_set is slow. For pointer sets without a pointer-stability
or iterator-stability requirement, use SmallPtrSet<T *, 0> for a smaller
code size.
[TTI] Add missing no-cost coroutine intrinsics (#203816)
These intrinsics are lowered in the CoroCleanup pass and don't represent
actual code. This patch adds them to the no-cost list so they do not
contribute to the cost of inlining and optimization.
[flang][mlir] Add flang to mlir lowering for groupprivate (#180934)
This PR implements the Flang frontend lowering for the OpenMP
`groupprivate`
Changes:
- Update genOMP handler for OpenMPGroupprivate in OpenMP.cpp to generate
`omp.groupprivate` MLIR operation.
- Add clause processing for groupprivate variables
- Add test cases for groupprivate lowering
Co-Authored-By: Claude
[noreply at anthropic.com](mailto:noreply at anthropic.com)
[RISCV] Consider known leading zeros in narrowIndex for gather/scatter. (#203970)
If there are enough leading zeros for the shift amount, then
we can do the shift in the narrow type.
[AMDGPU] Track VALU instructions separately for WMMA coexecution hazards
WMMA coexecution hazards can only be resolved by VALU instructions, not
S_NOPs. Track VALU/WMMA instructions separately so the scheduler can
accurately determine stall cycles.
[AMDGPU] Set WMMA source-operand reuse bits in SIPreEmitPeephole
gfx1250 WMMA instructions can set matrix_a_reuse / matrix_b_reuse bits
that keep the A or B source operand in a high-temporality state in the
VALU source-operand cache, so a later WMMA reusing the same registers
hits in the cache instead of re-reading the register file.
Add a late, post-RA peephole in the existing pre-emit peephole pass that
scans each basic block and, for every WMMA, sets the A/B reuse bit when
one of the next few WMMAs reuses the same physical registers as its A or B
operand and those registers are not redefined in between.
Stale sticky entries in the cache are cleared when a register is used in
an instruction without a reuse bit being set. Therefore, the final WMMA
use of the same source should not set the bit.
[AMDGPU] Update f8f6f4-wmma hazard tests regarding matrix format, NFC (#204037)
Need to map the matrix format suffix to the register size correctly in
the MIR tests. For example, 'F4' format needs v8i32 register class.
[llvm-shlib] Fix parallel build failure in gen-msvc-exports.py on Windows (#197190)
The script was failing with OSError [Errno 22] Invalid argument when
opening a temp file during parallel Windows builds. The root cause is
that mkstemp() creates and closes a file descriptor, then the script
re-opens it by path in a loop. On Windows, between the close and
re-open, antivirus software or filesystem contention from parallel
build processes can briefly lock the file, causing the subsequent
open() to fail with EINVAL.
Fix by replacing the temp-file-based approach with
subprocess.check_output(), which captures nm's stdout directly in
memory. This eliminates the temp file entirely, removing the race
condition and simplifying the code (removing the unused mkstemp,
contextmanager, and os imports along with the helper functions).
[lldb] Make DYNAMIC_SCRIPTINTERPRETERS the default on Darwin & FreeBSD. (#204015)
Make `LLDB_ENABLE_DYNAMIC_SCRIPTINTERPRETERS` the default on Darwin and
FreeBSD. We can opt-in more platforms later, or even make this the
global default (except Windows, which doesn't need it).
Fixes #183791
[Clang][Sema][NFCI] Simplify `resolveAllocationOverload()`
`resolveAllocationOverload()` performs multiple rounds of overload
resolution (typed and untyped, aligned and unaligned), each requiring a
slightly different argument list. Previously, the argument vector was
mutated in-place, which made the flow hard to follow.
This refactor prepares the list of arguments before calling
`resolveAllocationOverload()`. The preferred argument list is passed in
`PrefArgs`, while the fallback arguments are passed in `FallbackArgs`.
If the fallback resolution is not required, `FallbackArgs` is empty.
When making a nested call to perform the resolution with the fallback
arguments, the current set of candidates is passed in `PrefCandidates`
(formerly, `AlignedCandidates`). This argument also serves as a flag
used to distinguish the top-level call from nested fallback calls.