clang/AMDGPU: Refactor triple adjustments (#190343)
Factor this similar to the ARM case for future
expansion. The difference being -mcpu is treated as
an alias for -mcpu instead of something separately
useful.
I don't understand this mutation of the triple into
spirv64. The only test where this appears to matter
does not use -mcpu. Previously this would only match
for -mcpu, but this would change the behavior to prefer
-march before falling back to -mcpu.
[MLIR][Arith] Fix index_cast/index_castui chain folding to check intermediate width (#189042)
The patterns `IndexCastOfIndexCast` and `IndexCastUIOfIndexCastUI` in
ArithCanonicalization.td incorrectly eliminated a pair of index casts
whenever the outer result type equalled the original source type,
without verifying that the intermediate cast was lossless.
For example, the following was wrong folded to `%arg0`:
%0 = index_castui %arg0 : i64 to index
%1 = index_castui %0 : index to i8 ← truncates to 8 bits
%2 = index_castui %1 : i8 to index ← incorrectly removed
The pattern matched `%1`/`%2` because `i8.to(index)` has the same result
type as `i64.to(index)`, even though the i8 intermediate silently drops
56 bits. The same bug existed for the signed `index_cast` variant.
Fix: move the optimization into the `fold` methods of `IndexCastOp` and
`IndexCastUIOp` with an explicit check that the intermediate type is at
least as wide as the source type (using
[8 lines not shown]
[clang-tidy] Add `AllowExplicitObjectParameters` option to `avoid-capturing-lambda-coroutines` (#182916)
Add an off-by-default `AllowExplicitObjectParameters` option to the
existing `cppcoreguidelines-avoid-capturing-lambda-coroutines` check.
When enabled, lambda coroutines that use C++23 "deducing this" (explicit
object parameter) are not flagged, since captures are moved into the
coroutine frame ([1], [2], [3]). In C++23 mode, the check also provides
fix-it hints to add `this auto` as the first parameter for lambdas that
don't use it.
The option is off by default to match the current C++ Core Guidelines,
which do not yet recognize explicit object parameters as a solution
([4]). Once the guidelines adopt the proposal, the default can be
flipped.
[1]:
https://github.com/scylladb/seastar/blob/master/doc/lambda-coroutine-fiasco.md#solution-c23-and-up
[5 lines not shown]
[GlobalISel] Fix UMR in `SwiftErrorValueTracking` (#190273)
Fix issue reported on
https://github.com/llvm/llvm-project/pull/188296#issuecomment-4179103756
`SwiftErrorValueTracking` holds per-function state used by
`IRTranslator`.
On targets where `TargetLowering::supportSwiftError()` is false, (e.g.
wasm) `SwiftErrorValueTracking::setFunction()` exits early.
Historically, that early return happened before clearing per-function
containers, and pointer members (including `SwiftErrorArg`) had no
in-class initialization.
The bad case is a function with a swifterror argument on such a target:
`IRTranslator` uses `SwiftError.getFunctionArg()` without checking
`supportSwiftError()` and this could read an uninitialized
`SwiftErrorArg` value. (SelectionDAG gates the `getFunctionArg` usages
behind `supportSwiftError()`, so it's specific to GlobalISel)
[10 lines not shown]
[VPlan] Mark VPCanonicalIVPHI as not reading memory (NFCI). (#190338)
The canonical IV does not access any memory. Mark accordingly. This
should be NFC end-to-end.
PR: https://github.com/llvm/llvm-project/pull/190338
clang/AMDGPU: Refactor triple adjustments
Factor this similar to the ARM case for future
expansion. The difference being -mcpu is treated as
an alias for -mcpu instead of something separately
useful.
I don't understand this mutation of the triple into
spirv64. The only test where this appears to matter
does not use -mcpu. Previously this would only match
for -mcpu, but this would change the behavior to prefer
-march before falling back to -mcpu.
[CIR] Handle vtable-lowering-with-incomplete types (#190216)
The NYI diagnostic in getFunctionTypeForVTable showed up a few times in
testing, so this patch is attempting to fix that up.
The reproducer here is a function type for a vtable that has an
incomplete type in it(return or parameter). Classic codegen chooses to
represent this as an opaque type.
This patch instead removes the special v-table handling here, so that we
can instead just represent the types as incomplete record types.
At the moment, this patch ends up lowering incomplete types as 'empty'
types in LLVM-IR, which we may find we need to modify in the future,
however at the moment, it seems to work.
This patch ALSO changes the definition of RecordType::isSized to only be
true for complete types, which prevents a number of other things from
attempting to add attributes/check the size of the type/etc, but those
are irrelevant for the purposes of vtable emission.
[CIR] Implement top level 'ExportDecl' emission (#190286)
This is a pretty simple one, its just a type of decl-context. The actual
exporty-ness is handled on a per-declaration basis.
This patch just makes sure we emit them, as I suspect this will reveal
quite a bit more issues in module code I suspect.
[clang] Make -dump-tokens option align tokens (#164894)
When using `-Xclang -dump-tokens`, the lexer dump output is currently
difficult to read because the data are misaligned. The existing
implementation simply separates the token name, spelling, flags, and
location using `'\t'`, which results in inconsistent spacing.
For example, the current output looks like this on provided in this
patch example **(BEFORE THIS PR)**:
<img width="2936" height="632" alt="image"
src="https://github.com/user-attachments/assets/ad893958-6d57-4a76-8838-7fc56e37e6a7"
/>
# Changes
This small PR improves the readability of the token dump by:
+ Adding padding after the token name and after the spelling (the
[9 lines not shown]
[AMDGPU][CodeGen] Implement SimplifyDemandedBitsForTargetNode for readfirstlane. (#190009)
Propagate demanded bits through readfirstlane intrinsic in
AMDGPUISelLowering with SimplifyDemandedBitsForTargetNode
implementation.
This allows upstream zero/sign extensions to be eliminated when only a
subset of bits is used after the intrinsic.
Partially addresses #128390.
[DAG] isKnownToBeAPowerOfTwo - add missing DemandedElts handling to ISD::TRUNCATE and hidden m_Neg pattern (#190190)
Use MaskedVectorIsZero to match X & -X pattern when only DemandedElts
match the negation pattern
Fixes #181654 (properly)
[lldb] Fix DIL error diagnostics output (#187680)
* Correctly return the result when used from the console, so that
`DiagnosticsRendering` could use it to output the error.
* Add location pointer to `DILDiagnosticError` internal formatting to
show diagnostics when called from the API.
[mlir][linalg] Fix crash in tile_reduction when output map has constant exprs (#189166)
`generateInitialTensorForPartialReduction` and the `getInitSliceInfo*`
helpers unconditionally cast every result expression of the partial
result AffineMap to `AffineDimExpr`. When the original output indexing
map contains a constant (e.g. `affine_map<(d0,d1,d2)->(d0,0,d2)>`), the
constant expression propagates into the partial map and the cast
triggers an assertion.
Fixes #173025
Assisted-by: Claude Code
[mlir][Affine] Fix LICM incorrectly hoisting stores from zero-trip-count loops (#189165)
The affine-loop-invariant-code-motion pass was hoisting side-effectful
operations (e.g. affine.store) out of loops whose trip count is
statically known to be zero. This caused stores to execute
unconditionally even though the loop body should never run, producing
incorrect results.
The fix skips hoisting of non-memory-effect-free ops when
getConstantTripCount returns 0. Pure/side-effect-free ops are still
eligible for hoisting because they cannot change observable program
state.
Fixes #128273
Assisted-by: Claude Code
[MachinePipeliner] Remove isLoopCarriedDep and use DDG (#174394)
This patch completely removes `isLoopCarriedDep`, which was used
previously to identify loop-carried dependencies in the DAG. Now that we
have the DDG representation, this special handling is no longer
necessary. Simply replacing its usage with the DDG causes several tests
to fail, since cycle detection takes some of the validation-only edges
in the DDG into account. To address this, this patch introduces extra
edges in the DDG, which are used only for cycle detection and not for
other parts of the pass (e.g., scheduling). The extra edges are
determined to preserve the existing behavior of the pass as closely as
possible, which makes the predicates for adding them somewhat complex.
Split off from #135148, and the final patch in the series for #135148