[llvm-objcopy][MachO] Use alignToPowerOf2 instead of alignTo (#204033)
During the review of #203680 I noticed that Mach-O objcopy files seems
to use `alignTo` and import `Alignment.h` to align some offsets to page
boundaries and similar requirements. However, the `alignTo` in
`Alignment.h`, while being intended for powers of 2, requires using an
alignment of type `llvm::Align`, and needs explicit conversion from
`uint64_t` and similar. Single `Alignment.h` includes `MathExtras.h`,
the `alignTo` being invoked ends up being a generic `alignTo` that does
not require powers of 2, and perform divisions and multiplications.
While some of those might be optimized by the compiler into efficient
power of 2 operations, there's an explicit `alignToPowerOf2` version
that is optimized and asserts the alignment is a power of 2 (with
asserts enabled). Since all the alignments should be power of 2 for the
Mach-O binary format, change from `alignTo` to `alignToPowerOf2` to make
the fact more visible (and get the extra safety net of the assertions).
As expected, the test suite of objcopy doesn't show any regressions, but
I have not done a performance benchmark around this either.
[llvm-objcopy][MachO] Align __LINKEDIT entries to pointer size (#203680)
Align Mach-O __LINKEDIT entries to the target pointer size when building
the tail layout. This matches the behavior of ld64 and lld-macho.
dyld on macOS 27 rejects loading dylibs with misaligned __LINKEDIT
entries.
See #203678 for details and the motivation of this fix.
AI Tool Use Disclosure:
Regarding the PR and the linked issue, I have personally wrote every
single part of the PR by myself, and have/ran/verified every single part
of the issue report as well without any AI tool usage.
I have used LLM-based coding agents only for debugging purposes, e.g. to
figure out why the dylib was not loading (from the original bug report),
and figuring out how to build, run, and test my local `llvm-objcopy`.
[VPlan] Skip shl->mul SCEV rewrite for out-of-range shift amounts. (#204921)
getSCEVExprForVPValue rewrites `shl x, c` as `x * (1 << c)` using
ScalarEvolution::getPowerOfTwo, which asserts that the power is less
than the type's bit width.
Only perform the rewrite when the shift amount is less than the
operand's bit width, to avoid assertion.
[FileCheck] Use default colors in input dumps
This patch makes two improvements to colors used in FileCheck input
dumps:
1. Without this patch, input line numbers and ellipses have a
foreground color of black, which is hard to see in a terminal with
a dark color theme. This patch changes that to the terminal's
default color.
2. Without this patch, the input text is accidentally set to bold when
neither `-v` or `-vv` is specified. Perhaps I never noticed
because I tend to always use `-vv`. This patch changes that to use
the terminal's default color.
Case 2 exposes a problem with LLVM's color implementation. Without
this patch, the call to `WithColor`'s constructor actually specifies
bold as `false`, but `WithColor` ignores that when the color is
`SAVEDCOLOR`. While it seems like that should be fixed, I am
concerned about the impact of such a fix on other tools that might
[12 lines not shown]
[AMDGPU] Use SchedModel latencies for Fence barrier edges (#204657)
For memory->fence dependencies, this PR sets the latency of the edge to
the instr latency of the predecessor memory instruction.
During lowering of these fences, we insert the necessary waitcnts, and
we end up waiting for any outstanding memory op at these fences. Thus,
the latency of the edges should be based on latency of the associated
load/stores.
[FileCheck] Use default colors in input dumps
This patch makes two improvements to colors used in FileCheck input
dumps:
1. Without this patch, input line numbers and ellipses have a
foreground color of black, which is hard to see in a terminal with
a dark color theme. This patch changes that to the terminal's
default color.
2. Without this patch, the input text is accidentally set to bold when
neither `-v` or `-vv` is specified. Perhaps I never noticed
because I tend to always use `-vv`. This patch changes that to use
the terminal's default color.
Case 2 exposes a problem with LLVM's color implementation. Without
this patch, the call to `WithColor`'s constructor actually specifies
bold as `false`, but `WithColor` ignores that when the color is
`SAVEDCOLOR`. While it seems like that should be fixed, I am
concerned about the impact of such a fix on other tools that might
[12 lines not shown]
[LoopCacheAnalysis] Generate tests by update_analyze_test_checks.py (#204807)
Since loop interchange has been enabled in the default pipeline,
development on LoopCacheAnalysis, which is used by LoopInterchange, is
becoming more active. So I think it's a good time to support automatic
test generation for LoopCacheAnalysis.
This patch does two things. First, it changes LoopCachePrinterPass from
a loop pass to a function pass to make it possible to use
update_analyze_test_checks.py. Second, it rewrites all the CHECK
directives in the existing LoopCacheAnalysis tests using the script.
[Verifier] Only accept noundef metadata on loads and update metadata tests (#204922)
noundef metadata has been accepted everywhere so far, which seems to be
an oversight. This patch rejects it everywhere except for load
instructions, which seem to be the only ones where it's supposed to be
supported. The other metadata tests are also updated so they are
somewhat similar to each other.
[VectorCombine] Add subvector reduction support to foldShuffleChainsToReduce (#199872)
Extends foldShuffleChainsToReduce to recognise subvector reductions
where the chain narrows through shuffles before extracting lane 0.
The matcher tracks per output lane attribution as the chain is walked.
Each lane carries a per source bitmask of contributing source lanes plus
a poison flag. Shuffles permute these records. Binops union them. At the
extract, lane 0's bitmasks rebuild the reduction as one or more partial
reduce intrinsics. The walk is capped at 32 chain nodes.
Also added new test file with 11 tests:
| Test | Reason |
| ------------------------------------------------------ |
--------------------------------------------- |
| `_add_v4i32`, `_add_v8i16`, `_add_v16i8`, `_add_v64i8` | basic
subvector reductions across types/sizes |
| `_mul_v16i8` | non-add reduction |
[20 lines not shown]
[clangd] Look for resource-dir relative to detected compiler path as a fallback (#203332)
If the standard resource directory (which is searched for relative to the clangd
executable) does not exist, look for one relative to the detected compiler as a
fallback. This handles some packaging schemes where clangd and clang are
installed in different prefixes and the resource directory is only located in the
latter.
Also print an error message to the log if the fallback didn't find an existing
directory either.
[IR] handle oversized constant alloca counts in getAllocationSize (#204540)
AllocaInst::getAllocationSize() unconditionally calls getZExtValue() for
array allocas, which asserts when the constant element count is wider
than 64 bits.
Use tryZExtValue() when reading the constant array size instead. If the
count cannot be represented in uint64_t, return std::nullopt rather than
asserting, matching the existing contract.
Fixes #203519
[X86] combineX86ShufflesRecursively - delay widening shuffle inputs. NFC. (#204931)
Perform resolveTargetShuffleInputsAndMask earlier as widening shouldn't
merge any inputs (we canonicalize small shuffle inputs earlier).
We should be able to move the widenSubVector calls inside
combineX86ShuffleChain in a future commit, but this patch should be NFC.
[InstCombine] Fold trunc scmp/ucmp -> scmp/ucmp with the target type being what we truncate (#196847)
I don't think I need an alive2 for this, since this is basically a
tautology/self-definition.
[clang] Implement `__builtin_elementwise_pext` and `__builtin_elementwise_pdep` (#204296)
Closes #204126
This PR adds `__builtin_elementwise_pext` to emit `@llvm.pext` and `__builtin_elementwise_pdep` to emit `@llvm.pdep`.
[Reassociate] Distribute multiply over add to enable factorization (#178201)
### This patch improves ReassociatePass to handle patterns like:
(x*C1) - ((y+x)*C2) → x*(C1-C2) - (y*C2)
The optimization consists of two changes:
1. Distribution pre-processing: Transform (A+B)*C → A*C + B*C when:
- The add has exactly one use (avoids code bloat)
- Both add operands are non-constant (avoids unprofitable cases)
This exposes common factors that would otherwise be hidden inside
the addition, enabling subsequent factorization.
2. Factorization heuristic: Prefer extracting non-constant factors
(Instructions/Arguments) over constant factors when occurrence
counts are equal. This enables better constant folding opportunities.
Note: undef is excluded from this preference to maintain existing
[31 lines not shown]
[flang][OpenMP] Centralize pushing/popping directive context
Put calls to PushContextAndClauseSets to the Enter function for
OpenMPConstruct and OpenMPDeclarativeConstruct, and popping the
context to the corresponding Leave functions. This moves most of
the context handling to the top-level AST entries. This will
allow more centralized verification of common clause properties
in the future.
[LoopCacheAnalysis] Drop isLoopSimplifyForm check (NFCI) (#204822)
This patch removes the isLoopSimplifyForm() check from
LoopCacheAnalysis. This check was problematic when I tried migrating
LoopCachePrinterPass from a loop pass to a function pass (i.e.,
#204807), because the former applies the loop-simplify pass via
FunctionToLoopPassAdapter, whereas the latter does not. I believe this
check is meaningless because the analysis doesn't pay attention to the
details of the actual loop structure. So this change should not affect
the behavior of the pass.
[clang-format] Reset `Line->IsModuleOrImportDecl` in `addUnwrappedLine` (#204565)
The `IsModuleOrImportDecl` flag was not reset in `addUnwrappedLine`.
Since the parser recycles the `Line` object, this flag remained `true`
for all subsequent lines in the file, which disabled wrapping
(`CanBreakBefore` in `TokenAnnotator.cpp`) for expression-level
constructs after any C++20 module or import statement, causing some
formatting rules to not be applied in places. This patch fixes the issue
by resetting the flag to `false`.
---------
Co-authored-by: Owen Pan <owenpiano at gmail.com>
AMDGPU: Refactor AMDGPUTargetID to not store MCSubtargetInfo
Store the triple string and GPUKind instead. The dependence
on checking AMDHSA seems like an anti-feature, but maintain the
behavior of not printing the modifiers for other OSes. Start
parsing the target ID instead of performing a direct string
comparison. Also improve test coverage for the treatment of the
environment component of the triple. The main behavioral change
is this will now produce normalized triples in the output and
diagnostics. Practially, this means all of the places that
currently emit "--" will be expanded into "-unknown-".
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
[libc++][test] Rewrite tests for `std::byte` (#204116)
Previously, test files for `std::byte` were less than ideal. There were
many issues.
- `byte.pass.cpp` tested many properties which hold for enumeration
types, but failed to verity that `std::byte` is a scoped enumeration
type. Also, it was not a `.compile.pass.cpp`.
- `enum_direct_init.pass.cpp` seemed to be completely redundant.
- It was not tested that compound assignment operators return references
to their left operands.
- Return types of operators were rarely tested.
- Constraints of functions were not tested using SFINAE techniques.
- Test cases were not made run in both constant evaluation and at run
time in the conventional way.
This patch
- rewrites tests for `std::byte` to address these issues,
- expands test coverage for integer types listed in `type_algorithms.h`,
and
- updates lit comments to new-style `// REQUIRES: std-at-least-c++17`.