[HLSL] Codegen for handling global resource array initialization (#198891)
When a global resource array is accessed - whether it is declared at a
global scope or as part of a global struct instance - all of its
resource elements should be initialized from binding into a temporary
local resource array. This change intercepts the Clang codegen at the
relevant places to allow `CGHLSLRuntime` handle this special global
resource array initialization.
Fixes #187087
Fixes #198888
[AMDGPU] Remove redundant s_wait_xcnt after implicit XCNT drains (#198823)
On gfx1250 several instructions implicitly drain XCNT in hardware:
`s_barrier_wait`/`signal`/`signal_isfirst`, `s_sendmsg`, PC-changes etc.
This patch will remove redundant `s_wait_xcnt` after implicit XCNT
drains.
Pre-commit tests on #198772
Fix: LCOMPILER-1665
[clang-tidy] Fix false positive of parentheses removal for overloaded operator (#192254)
Fixes #189217
don't remove necessary parentheses for an overloaded operator, when
the parenthese occurs in the context of a binary operation
E.g. (E1 & E2) != E3 // the brackets aren't redundant here
E.g. (E1 & E2) // brackets are redundant here
[LifetimeSafety] Avoid assert on variadic placement new (#199588)
Avoid assuming that a placement allocation function has a second
`ParmVarDecl` before checking whether that parameter is `void*`.
Variadic `operator new(size_t, ...)` can have a placement argument
matched by the ellipsis instead.
As of AI Usage: Codex is used to help rephrase part of the new comments.
Closes https://github.com/llvm/llvm-project/issues/199584
[HIP][Driver] Forward -fcoverage-mapping flags to device compiler (#198872)
Add `-fcoverage-mapping`, `-fno-coverage-mapping`,
`-fcoverage-compilation-dir=`, `-ffile-compilation-dir=`, and
`-fcoverage-prefix-map=` to the LinkerWrapper `CompilerOptions`
forwarding list. Without this, passing `-fprofile-instr-generate
-fcoverage-mapping` to clang for a HIP program silently omits the
coverage mapping flags from the embedded device recompilation, so
`__llvm_covmap`/`__llvm_covfun` sections are never emitted for device
code.
[LoopFusion] reject unsafe scalar flow dependences (#195895)
`loop-fusion` treats any loop-invariant scalar non-anti dependence as
safe to fuse. In the linked issue, it incorrectly allows scalar flow
dependences where the first loop writes a loop-invariant location and
the second loop later reads that same location. Fusion interleaves the
producer and consumer and this changes the value observed by the second
loop.
Example C source would look like:
```C
for (int i = 0; i < N; i++) {
ptr[0] = i;
}
for (int j = 0; j < N; j++) {
out[j] = ptr[0];
}
=>
for (int i = 0; i < N; i++) {
[14 lines not shown]
[Driver] Honor /Fo when deriving the split-dwarf .dwo path (#199613)
SplitDebugName checked -o and /o but not /Fo, so clang-cl /Fo<path> /c
fell through to the cwd-relative fallback and every .dwo landed in cwd
under <source-stem>.dwo regardless of the .obj location.
[PGO][HIP] Stop pulling ROCm.o into every PGO host link (#200101)
PR #177665 added an unconditional `extern` reference to
`__llvm_profile_hip_collect_device_data` from `InstrProfilingFile.c`,
which forces `InstrProfilingPlatformROCm.o` (and its sanitizer_common /
interception dependencies) out of `libclang_rt.profile.a` in every PGO
binary. That breaks bots without `-lpthread` and races dlsym/PLT state
in non-HIP programs via the interceptor constructor.
Fix:
- Declare the hook `COMPILER_RT_WEAK` and gate the call on its address.
No `COMPILER_RT_VISIBILITY`: a hidden weak-undef function would be
non-preemptible and the address test would fold to true.
- Gate `installHipModuleInterceptors` on `dlsym(hipModuleLoad)` so the
constructor is a no-op if `ROCm.o` is still pulled in.
Fixes:
- https://lab.llvm.org/buildbot/#/builders/66/builds/31311
- https://lab.llvm.org/buildbot/#/builders/174/builds/36180
[7 lines not shown]
[clang][AMDGPU] Fix -ast-print crash on expanded predicate builtins (#199963)
ExpandAMDGPUPredicateBuiltIn synthesized an IntegerLiteral typed
_Bool/bool — a shape no other producer creates, and one that
StmtPrinter::VisitIntegerLiteral has no case for. -ast-print on the
resulting if-condition hit llvm_unreachable.
Emit the canonical boolean literal instead:
- C++, C23, OpenCL, HIP: CXXBoolLiteralExpr 'bool'
- pre-C23 C: IntegerLiteral 'int'
In the C case this matches what <stdbool.h>'s true/false macros expand
to.
Fixes #199563
[clang] fix getTemplateInstantiationArgs
This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.
This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.
Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.
Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
[SelectionDAG] Widen <2 x T> vector types for atomic store
Vector types of 2 elements must be widened. This change does this
for vector types of atomic store in SelectionDAG so that it can
translate aligned vectors of >1 size.
[RISCV] Fix incorrect CM.MVSA01/QC_CM_MVSA01 generation with Zdinx (#200000)
The `RISCVMoveMerger` pass was incorrectly forming
`CM_MVSA01/QC_CM_MVSA01` when `Zdinx` was enabled. The pass attempted CM
merge for copy pairs even when the first copy was not an `a0/a1-based`
CM candidate.
Fix by only running `findMatchingInst` when the current copy is a valid
CM candidate.
[RISCV][P-ext] Split v4i16/v8i8 INSERT/EXTRACT_VECTOR_ELT on RV32. (#199917)
With a constant lane index, split the vector and recurse on the
single-GPR half containing Idx (already Custom-lowered).
[AMDGPU] This reverts patches to use fp16 inline constants for i16 (#200091)
Patches reverted:
commit c315c662cd2d33e0c7f962fed742ee53626d8005
Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin at amd.com>
Date: Wed May 27 12:51:13 2026
[AMDGPU] Fix codesize estimate after #198005 (#200033)
This fixes failure in libc tests which checks the exact encoding
size. Encoding is now shorter, but it did not recognize fp16
immediates as an inlinable constant and assumes literal encoding.
Shorter encodings were created here:
https://github.com/llvm/llvm-project/pull/198005
commit 2b3bc03b5ef00e7eaa245420ca981c700e1c05c4
Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin at amd.com>
[15 lines not shown]
[alpha.webkit.UncountedLocalVarsChecker] Detect an assignment to a guardian argument (#198695)
A function parameter of type RefPtr<T>& should not be used as a guardian
variable of a raw pointer/reference variable if the function body
contains an assignment to it since such an assignment can shorten the
lifetime of the guarded object.
test(llvm-symbolizer): fix Wasm layering violation by using YAML (#200080)
Avoid using wasm-ld in LLVM tests by prebuilding the test binary
as a YAML file and using yaml2obj at test time.
This matches the approach taken in
4bce216e6b550c770f2e536422c3d95333f65ba3.
Because yaml2obj always uses 5-byte LEBs, the CODE section offset
shifted from 0x37 to 0x4b, so the file offsets passed to llvm-symbolizer
were updated accordingly.
Replaces #200046
Assisted-by: Gemini
[AMDGPU] This reverts patches to use fp16 inline constants for i16
Patches reverted:
commit c315c662cd2d33e0c7f962fed742ee53626d8005
Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin at amd.com>
Date: Wed May 27 12:51:13 2026
[AMDGPU] Fix codesize estimate after #198005 (#200033)
This fixes failure in libc tests which checks the exact encoding
size. Encoding is now shorter, but it did not recognize fp16
immediates as an inlinable constant and assumes literal encoding.
Shorter encodings were created here:
https://github.com/llvm/llvm-project/pull/198005
commit 2b3bc03b5ef00e7eaa245420ca981c700e1c05c4
Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin at amd.com>
[16 lines not shown]
[clang] fix member specializations of class and variable partial specializations
A partial specialization may be a member specialization even if it is not
an instantiation of a member partial specialization.
For example:
```C++
template<class> struct X {
template<class> struct Inner;
};
template<> template<class T>
struct X<int>::Inner<T*> {};
```
Make sure this state is represented, so that [temp.spec.partial.member]p2
can be applied.
Split off from #199528
[clang] fix getTemplateInstantiationArgs
This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.
This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.
Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.
Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
[coro] Use C calling convention for C++20 coroutines (#198943)
Change the calling convention for resume / destroy functions of C++
coroutines from `fastcc` to the C calling convention.
The resume / destroy functions are exposed as part of the coroutine ABI
and must be compatible with other compilers and other versions of LLVM.
fastcc is an LLVM-internal, unstable calling convention, though.
In practice, fastcc and the C calling convention are in sync for `void
func(void*)` function signatures on almost all platforms. Therefore, I
think we can still do this change without widespread ABI breakage.
`fastcc` and `ccc` do differ for i686 (x86-32), MIPS O32, PowerPC64
ELFv1 and Lanai. Afaik, those are all legacy ABIs and a recent feature
like C++20 coroutines is unlikely to be used by projects still targeting
legacy ABIs.
Historical context: I tried to figure out why `fastcc` was used. It is
[6 lines not shown]