[IR] Remove HasMetadata bit from instructions (#190651)
We were reserving one bit for a HasMetadata flag. However, this flag has
long since been moved into Value itself, prior to being removed from
there as well recently.
[Inline] Allow inlining with null_pointer_is_valid mismatch (#190510)
If the callee has null_pointer_is_valid but the caller does not, we
should still inline and add null_pointer_is_valid to the caller (which
is handled by an already existing inline adjustment rule).
This does mean that optimizations in the caller may be reduced by
unnecessarily preserving null checks, but that's still better than not
inlining at all. In particular, this check causes issues with LTO in the
Linux kernel, as the C portions are compiled with null_pointer_is_valid,
but the Rust portions are not.
The test is modified to show that the previous alwaysinline behavior now
always holds.
[CIR][AArch64] Implement vget_lane_bf16 and vgetq_lane_bf16 builtins (#186866)
Implements vget_lane_bf16 and vgetq_lane_bf16 builtins. Updates test
with CIR test.
Part of #185382
[C++20] [Modules] Don't profiling the callee of CXXFoldExpr (#190732)
Close https://github.com/llvm/llvm-project/issues/190333
For the test case, the root cause of the problem is, the compiler
thought the declaration of `operator &&` in consumer.cpp may change the
meaning of '&&' in the requrie clause of `F::operator()`. But it doesn't
make sense. Here we skip profiling the callee to solve the problem. Note
that we've already record the kind of the operator. So '&&' and '||'
won't be confused.
[clang-tidy] [Modules] Skip checking decls in clang-tidy (#145630) (#190733)
Close https://github.com/llvm/llvm-project/issues/145628
Note that I am not sure if this is the proper fix. On the one hand, the
fix lives in ASTMachers instead of clang-tidy. On the other hand, I feel
this may be a more general fix.
[clang] Fix crash on invalid out-of-line enum definition with template parameters (#188246)
clang crashes when an invalid out-of-line enum definition is provided
with template parameters. In these cases, clang produces a dependent
type within a non-dependent context, violating internal invariants.
The fix is to fallback the underlying type of the enum to `int` during
error recovery in `Sema::ActOnTag` when `Invalid` is true, making it
safe for downstream processing while still preserving the invalid
declaration in the AST.
Fixes #187909
[clang][Rewriter] Adjust end offset before RewriteBuffer::getMappedOffset() call (#187374)
Without this patch, only cases when a token length increased were
supported.
If a token length decreased, we returned a larger string than expected
(e.g. in the added tests, "xretur " would be returned instead of
"xretur")
[AMDGPU][SILowerSGPRSpills] Correct insertion of IMPLICIT_DEF in cycles (#186348)
si-lower-sgpr spills was observed inserting IMPLICIT_DEF for lane VGPR
restores in the cycle header. The virtual VGPR is therefore not live-in
to the header and wwm regallocfast does not insert a restore. This
results in the vgpr being clobbered after each backedge.
Correct this by inserting the IMPLICIT_DEF in a block that dominates
all entries.
Assisted by Claude.
[lldb][AIX] Extract CPU type and set up process architecture accordingly (#189910)
This PR is in reference to porting LLDB on AIX. Ref discusssions: [llvm
discourse](https://discourse.llvm.org/t/port-lldb-to-ibm-aix/80640) and
[#101657](https://github.com/llvm/llvm-project/issues/101657).
Complete changes together in this draft:
- [Extending LLDB to work on AIX
#102601](https://github.com/llvm/llvm-project/pull/102601)
Description:
The process architecture was previously initialized using a hardcoded
TCPU_PPC64 CPU type.
The logic has been updated to determine the CPU type dynamically by
inspecting the magic bytes and the XCOFF header. Based on this
information, the appropriate CPU type (TCPU_PPC or TCPU_PPC64) is
selected and used when constructing and setting the ArchSpec.
This change ensures that the process architecture correctly reflects the
underlying binary format.
[RISCV] Fix address type in Zacas seq_cst atomic pattern (#190729)
The seq_cst pattern in AMOCASPat used (vt GPR:$addr) for the address
operand, while all other patterns (monotonic, acquire, release, acq_rel)
consistently use (XLenVT GPR:$addr). This would produce a wrong type for
the address when vt differs from XLenVT (e.g., amocas.d on RV32 where
vt=i64).
Co-authored-by: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[mlir][reducer] Repalce module.emitWarning with module.emitError in ReductionTree pass (#190584)
This PR fixes the diagnostic message for mlir-reduce's reduction-tree
pass when the input module is not "interesting". Previously, running
with the warning pass would fail silently, and enabling debug options
would only show a generic "pass manager run failed" message without any
useful diagnostic information.
[Flang][Driver] Add support for '-fprofile-sample-use' option (#188697)
When the `-fprofile-sample-use=sample.prof` option is passed, the
compiler records the profile file path in `SampleProfileFile` . This
value is later used by the `SampleProfileLoaderPass`, which loads the
sample profile and injects the corresponding profiling metadata in the
LLVM IR.
Revert "Reland "[mlir][reducer] Add eraseRedundantBlocksInRegion and getSuccessorForwardOperands API to BranchOpInterface"" (#190727)
To decouple the BranchOpInterface implementation from the reduction-tree
changes. Reverts llvm/llvm-project#189253,
[mlir][CSE] Fix CSE markAnalysesPreserved<DominanceInfo, PostDominanceInfo> comment (#190471)
The original comment claimed that DominanceInfo and PostDominanceInfo
could be preserved because region operations are not removed. However,
the real reason was that the original CSE only deleted redundant
operations without moving any operation to a different block, leaving
the dominance tree structure unchanged. Part of
https://github.com/llvm/llvm-project/pull/180556.
[libc++][docs] Update paper and LWG issue lists after 2026-03 meeting (#189901)
[P3726R2](https://wg21.link/P3726R2) is a Core paper but adds
`std::start_lifetime`, so it needs to be listed in libc++'s
documentation.
For LWG issues, see [P4145R0](https://wg21.link/P4145R0) and
[P4146R0](https://wg21.link/P4146R0).
[CodeGenPrepare] Use Instruction::comesBefore instead of manual ordering (#190485)
After #172329, we noticed that some sources compiled with MSan take
1000x longer to compile. This is caused by quadratic complexity in
tryToSinkFreeOperands, which can be called on a significant number
of instructions within huge basic blocks.
This inefficiency was introduced in 9cfa9b4, which manually iterates
and creates a DenseMap of entire basic blocks for each interesting
instruction.
This patch avoids the manual ordering by using
Instruction::comesBefore(), which provides the exact same
ordering much more efficiently.
[clang-tidy] Fix performance-trivially-destructible with C++20 modules (#178471)
When a class definition is seen through both a header include and a
C++20 module import, destructors may appear multiple times in the AST's
redeclaration chain. The original matcher used `isFirstDecl()` which
fails in this scenario because the same declaration can appear as both
first and non-first depending on the view.
Replace `unless(isFirstDecl())` with `isOutOfLine()` which correctly
identifies out-of-line definitions by checking whether the lexical
context differs from the semantic context.
Also update clang-tools-extra's lit.cfg.py to call `use_clang()` instead
of `clang_setup()` to make the `%clang` substitution available for
tests.
Fixes #178102
Co-authored-by: Chuanqi Xu <yedeng.yd at linux.alibaba.com>
[flang][cuda] Lower unified variables as cuf.alloc in main program scope (#190713)
Remove the unified exception from CanCUDASymbolBeGlobal so unified
variables follow the same cuf.alloc lowering path as other CUDA data
attributes.
[AMDGPU] asyncmark support for ASYNC_CNT (#185813)
The ASYNC_CNT is used to track the progress of asynchronous copies
between global and LDS memories. By including it in asyncmark, the
compiler can now assist the programmer in generating waits for
ASYNC_CNT.
Assisted-By: Claude Sonnet 4.5
This is part of a stack:
- #185813
- #185810
Fixes: LCOMPILER-332
[AMDGPU] Fix setreg handling in the VGPR MSB lowering
There are multiple issues with it:
1. It can skip inserting S_SET_VGPR_MSB if we set the mode via
piggybacking. We are now relying on the HW bug for correct
behavior. If/when the bug is fixed lowering will be incorrect.
2. We should just unconditionally update MSBs if immediate allows it.
We shall set correct bits and keep the rest of the immediate
(that is done). There is no reasonable way for an user to change
MSBs nor does it do anything good to set it with SETREG and then
immediately overwrite with S_SET_VGPR_MSB.
3. We can always update immediate if Offset is zero.
4. Redundant mode changes created as seen in the
hazard-setreg-vgpr-msb-gfx1250.mir.
With unconditional immediate update most of time and not relying on
the SETREG for setting MSBs there is no good reason to complicate
handling by supporting SETREG as a piggybacking target. Moreover,
[10 lines not shown]