[FileCheck] Use default colors in input dumps
This patch makes two improvements to colors used in FileCheck input
dumps:
1. Without this patch, input line numbers and ellipses have a
foreground color of black, which is hard to see in a terminal with
a dark color theme. This patch changes that to the terminal's
default color.
2. Without this patch, the input text is accidentally set to bold when
neither `-v` or `-vv` is specified. Perhaps I never noticed
because I tend to always use `-vv`. This patch changes that to use
the terminal's default color.
Case 2 exposes a problem with LLVM's color implementation. Without
this patch, the call to `WithColor`'s constructor actually specifies
bold as `false`, but `WithColor` ignores that when the color is
`SAVEDCOLOR`. While it seems like that should be fixed, I am
concerned about the impact of such a fix on other tools that might
[12 lines not shown]
[AMDGPU] Use SchedModel latencies for Fence barrier edges (#204657)
For memory->fence dependencies, this PR sets the latency of the edge to
the instr latency of the predecessor memory instruction.
During lowering of these fences, we insert the necessary waitcnts, and
we end up waiting for any outstanding memory op at these fences. Thus,
the latency of the edges should be based on latency of the associated
load/stores.
[FileCheck] Use default colors in input dumps
This patch makes two improvements to colors used in FileCheck input
dumps:
1. Without this patch, input line numbers and ellipses have a
foreground color of black, which is hard to see in a terminal with
a dark color theme. This patch changes that to the terminal's
default color.
2. Without this patch, the input text is accidentally set to bold when
neither `-v` or `-vv` is specified. Perhaps I never noticed
because I tend to always use `-vv`. This patch changes that to use
the terminal's default color.
Case 2 exposes a problem with LLVM's color implementation. Without
this patch, the call to `WithColor`'s constructor actually specifies
bold as `false`, but `WithColor` ignores that when the color is
`SAVEDCOLOR`. While it seems like that should be fixed, I am
concerned about the impact of such a fix on other tools that might
[12 lines not shown]
[LoopCacheAnalysis] Generate tests by update_analyze_test_checks.py (#204807)
Since loop interchange has been enabled in the default pipeline,
development on LoopCacheAnalysis, which is used by LoopInterchange, is
becoming more active. So I think it's a good time to support automatic
test generation for LoopCacheAnalysis.
This patch does two things. First, it changes LoopCachePrinterPass from
a loop pass to a function pass to make it possible to use
update_analyze_test_checks.py. Second, it rewrites all the CHECK
directives in the existing LoopCacheAnalysis tests using the script.
[Verifier] Only accept noundef metadata on loads and update metadata tests (#204922)
noundef metadata has been accepted everywhere so far, which seems to be
an oversight. This patch rejects it everywhere except for load
instructions, which seem to be the only ones where it's supposed to be
supported. The other metadata tests are also updated so they are
somewhat similar to each other.
[VectorCombine] Add subvector reduction support to foldShuffleChainsToReduce (#199872)
Extends foldShuffleChainsToReduce to recognise subvector reductions
where the chain narrows through shuffles before extracting lane 0.
The matcher tracks per output lane attribution as the chain is walked.
Each lane carries a per source bitmask of contributing source lanes plus
a poison flag. Shuffles permute these records. Binops union them. At the
extract, lane 0's bitmasks rebuild the reduction as one or more partial
reduce intrinsics. The walk is capped at 32 chain nodes.
Also added new test file with 11 tests:
| Test | Reason |
| ------------------------------------------------------ |
--------------------------------------------- |
| `_add_v4i32`, `_add_v8i16`, `_add_v16i8`, `_add_v64i8` | basic
subvector reductions across types/sizes |
| `_mul_v16i8` | non-add reduction |
[20 lines not shown]
[clangd] Look for resource-dir relative to detected compiler path as a fallback (#203332)
If the standard resource directory (which is searched for relative to the clangd
executable) does not exist, look for one relative to the detected compiler as a
fallback. This handles some packaging schemes where clangd and clang are
installed in different prefixes and the resource directory is only located in the
latter.
Also print an error message to the log if the fallback didn't find an existing
directory either.
[IR] handle oversized constant alloca counts in getAllocationSize (#204540)
AllocaInst::getAllocationSize() unconditionally calls getZExtValue() for
array allocas, which asserts when the constant element count is wider
than 64 bits.
Use tryZExtValue() when reading the constant array size instead. If the
count cannot be represented in uint64_t, return std::nullopt rather than
asserting, matching the existing contract.
Fixes #203519
[X86] combineX86ShufflesRecursively - delay widening shuffle inputs. NFC. (#204931)
Perform resolveTargetShuffleInputsAndMask earlier as widening shouldn't
merge any inputs (we canonicalize small shuffle inputs earlier).
We should be able to move the widenSubVector calls inside
combineX86ShuffleChain in a future commit, but this patch should be NFC.
[InstCombine] Fold trunc scmp/ucmp -> scmp/ucmp with the target type being what we truncate (#196847)
I don't think I need an alive2 for this, since this is basically a
tautology/self-definition.
[clang] Implement `__builtin_elementwise_pext` and `__builtin_elementwise_pdep` (#204296)
Closes #204126
This PR adds `__builtin_elementwise_pext` to emit `@llvm.pext` and `__builtin_elementwise_pdep` to emit `@llvm.pdep`.
[Reassociate] Distribute multiply over add to enable factorization (#178201)
### This patch improves ReassociatePass to handle patterns like:
(x*C1) - ((y+x)*C2) → x*(C1-C2) - (y*C2)
The optimization consists of two changes:
1. Distribution pre-processing: Transform (A+B)*C → A*C + B*C when:
- The add has exactly one use (avoids code bloat)
- Both add operands are non-constant (avoids unprofitable cases)
This exposes common factors that would otherwise be hidden inside
the addition, enabling subsequent factorization.
2. Factorization heuristic: Prefer extracting non-constant factors
(Instructions/Arguments) over constant factors when occurrence
counts are equal. This enables better constant folding opportunities.
Note: undef is excluded from this preference to maintain existing
[31 lines not shown]
[flang][OpenMP] Centralize pushing/popping directive context
Put calls to PushContextAndClauseSets to the Enter function for
OpenMPConstruct and OpenMPDeclarativeConstruct, and popping the
context to the corresponding Leave functions. This moves most of
the context handling to the top-level AST entries. This will
allow more centralized verification of common clause properties
in the future.
[LoopCacheAnalysis] Drop isLoopSimplifyForm check (NFCI) (#204822)
This patch removes the isLoopSimplifyForm() check from
LoopCacheAnalysis. This check was problematic when I tried migrating
LoopCachePrinterPass from a loop pass to a function pass (i.e.,
#204807), because the former applies the loop-simplify pass via
FunctionToLoopPassAdapter, whereas the latter does not. I believe this
check is meaningless because the analysis doesn't pay attention to the
details of the actual loop structure. So this change should not affect
the behavior of the pass.
[clang-format] Reset `Line->IsModuleOrImportDecl` in `addUnwrappedLine` (#204565)
The `IsModuleOrImportDecl` flag was not reset in `addUnwrappedLine`.
Since the parser recycles the `Line` object, this flag remained `true`
for all subsequent lines in the file, which disabled wrapping
(`CanBreakBefore` in `TokenAnnotator.cpp`) for expression-level
constructs after any C++20 module or import statement, causing some
formatting rules to not be applied in places. This patch fixes the issue
by resetting the flag to `false`.
---------
Co-authored-by: Owen Pan <owenpiano at gmail.com>
AMDGPU: Refactor AMDGPUTargetID to not store MCSubtargetInfo
Store the triple string and GPUKind instead. The dependence
on checking AMDHSA seems like an anti-feature, but maintain the
behavior of not printing the modifiers for other OSes. Start
parsing the target ID instead of performing a direct string
comparison. Also improve test coverage for the treatment of the
environment component of the triple. The main behavioral change
is this will now produce normalized triples in the output and
diagnostics. Practially, this means all of the places that
currently emit "--" will be expanded into "-unknown-".
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
[libc++][test] Rewrite tests for `std::byte` (#204116)
Previously, test files for `std::byte` were less than ideal. There were
many issues.
- `byte.pass.cpp` tested many properties which hold for enumeration
types, but failed to verity that `std::byte` is a scoped enumeration
type. Also, it was not a `.compile.pass.cpp`.
- `enum_direct_init.pass.cpp` seemed to be completely redundant.
- It was not tested that compound assignment operators return references
to their left operands.
- Return types of operators were rarely tested.
- Constraints of functions were not tested using SFINAE techniques.
- Test cases were not made run in both constant evaluation and at run
time in the conventional way.
This patch
- rewrites tests for `std::byte` to address these issues,
- expands test coverage for integer types listed in `type_algorithms.h`,
and
- updates lit comments to new-style `// REQUIRES: std-at-least-c++17`.
[MLIR][WASM] Introduce the RaiseWasmMLIRPass to convert WasmSSA MLIR to core dialects (#164562)
This is following https://github.com/llvm/llvm-project/pull/154674 and
still related to
https://discourse.llvm.org/t/rfc-mlir-dialect-for-webassembly/86758.
This PR introduces the RaiseWasmMLIRPass. This pass lowers WasmSSA MLIR
to other dialects of the LLVM ecosystem (namely: arith, math, cf and
memref).
This is the first PR of a series of 2 or 3 to introduce the lowering, as
an introduction it brings support for function calls, local and global
variables and handling of arithmetic operations. As explained in the
RFC, most WasmSSA operations have been made to stay close to other
dialects' semantics so that conversion is trivialized.
---------
Signed-off-by: Ferdinand Lemaire <flemairen6 at gmail.com>
Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire at woven-planet.global>
Co-authored-by: Ferdinand Lemaire <flemairen6 at gmail.com>