[CostModel][X86] Reduce cost of pre-SSE41 select shuffle (#207400)
The all-logic instructions have better throughput/latency than shuffles
Confirmed with uops.info, llvm-mca and agner
Revert "[IR] Explicitly specify target feature for module asm" (#207399)
Reverts llvm/llvm-project#204548
This is causing the runtimes build to fail with e.g.:
```
<inline asm>:11:5: error: 32 bit reloc applied to a field with a different size
11 | jmp __interceptor_strlen at plt
| ^
```
See comments on the PR.
[DebugInfo] Truncate implicit value constants to source type width (#206671)
This is a follow-up to #204353.
mikaelholmen and bevin-hansson reported that the previous change could
assert downstream when emitting `DW_OP_implicit_value` for a source
integer type wider than the target generic DWARF stack type, if the
debug-value carrier integer contains bits outside the declared source
type width.
The fix is to construct the source-width `APInt` with explicit
truncation enabled before emitting the implicit value bytes. This
preserves the intended wrap/truncate behavior and avoids asserting on
otherwise recoverable debug-value input.
A regression test is added for an `unsigned _BitInt(48)` debug value on
i386, covering both an out-of-range positive carrier value and an
all-ones negative carrier value.
Forward declare TextEncodingConverter in TextEncoding.h, move config.h into TextEncoding.cpp (#207382)
This patch forward declares TextEncodingConverter in
clang/include/clang/Lex/TextEncoding.h, and moves config.h into
llvm/lib/Support/TextEncoding.cpp instead of the header.
[Clang] Fix crash on subscripting a complete matrix subscript expression (#207317)
Subscripting a complete MatrixSubscriptExpr (which has scalar type)
caused an assertion failure in ActOnArraySubscriptExpr because the code
unconditionally asserted isIncomplete() on any MatrixSubscriptExpr base.
Fix by guarding the matrix subscript path with an isIncomplete() check,
allowing complete matrix subscript expressions to fall through to the
standard subscript handling, which emits an appropriate diagnostic.
Fixes #203163
[AArch64] Fix ReconstructShuffle for known vscale>1 (#205099)
The code at AArch64TargetLowering::ReconstructShuffle expects
NEON-compatible types. But for e.g. vscale_range = {2}, we can get legal
fixed-length vectors that are wider than 128 bits.
[Clang] Remove unused TokenKey::KEYNOZOS (#207132)
[Clang] Remove unused TokenKey::KEYNOZOS
KEYNOZOS was defined as a TokenKey flag to mark keywords not supported
on z/OS, but no keyword in TokenKinds.def actually uses it. This patch
removes the unused enum value and its associated handling code.
Build: `ninja clang` succeeded (2923/2923 targets).
Tests: `ninja check-clang` passed — 51180 passed, 0 failed.
AI assistance was used for code review analysis and CI failure
debugging.
Fixes #206877
Co-authored-by: Chenguang Ding <dingchenguang at kylinos.cn>
[analyzer][docs] Fix invalid MyST toctree 'numbered' option after Markdown migration (#207217)
The RST-to-Markdown migration (#206181) converted the RST flag
`:numbered:` into `:numbered: true`.
MyST parses the toctree `numbered` option as `int_or_nothing`, so the
string `true` fails with:
```
'toctree': Invalid option value for 'numbered': true:
invalid literal for int() with base 10: 'true'
```
This breaks the `-W` (warnings-as-errors) `docs-clang-html` build.
Make `numbered` a valueless flag, which MyST accepts (equivalent to the
original RST behavior of numbering all levels).
Assisted-By: claude
[Clang][SVE ACLE] Remove +bf16 requirement from neon-sve bridge builtins. (#205332)
These builtins only care about the size of the element type and do not
require bfloat specific instructions.
[AMDGPU] Accept sext addresses when folding image ops to a16 (#203189)
canSafelyConvertTo16Bit() only accepts a zext when narrowing image
address coordinates to 16 bits. Add an opt-in AllowI16SExt flag so a
sext from i16 is accepted too, and enable it for sampler-less image
instructions.
Coordinates of sampler-less loads/stores are unsigned, so sext and zext
only disagree for a negative i16 (>= 0x8000), which is already out of
bounds since the maximum image dimension is <= 0x8000. Accepting the
sext therefore lets such coordinates fold to the a16 form, reducing VGPR
pressure.
Co-authored-by: Barbara Mitic <Barbara.Mitic at amd.com>
[VPlan] Optimize pre-increment IV latch users with tail folding (#206499)
This was noticed after #204089 caused IndVarsSimplify to convert some
live out IV users to use the pre-incremented IV, not the
post-incremented.
Tail folded live-outs don't have the `(extract-last-lane
(extract-last-part foo))` form, but instead have the form `(extract-lane
(last-active-lane header-mask), foo)`.
For post-incremented IVs in tail folding, these are converted to
VPInstruction::ExitingIVValue which are handled separately. But
ExitingIVValue can't be used for the pre-incremented IV. So this teaches
optimizeLatchExitInductionUser to detect the last-active-lane of the
header mask form.
[ADT][NFC] Remove unused includes in DenseMap/DenseSet headers (#207282)
Remove unused includes in DenseMap/DenseSet headers.
`llvm/Support/AlignOf.h` was transitively included in
`llvm/Support/JSON.h`
[mlir][OpenMP] Change device declare target functions to hidden visibility (#207234)
During OpenMP lowering, globally visible device functions are emitted.
These functions might not be kernels themselves, but are designed to
only be called in a kernel context. However, if they are unused, and not
inlined, and reference LDS, the AMDGPU ISel emits lots of misleading
warnings related to "local memory global used by non-kernel function".
Fix by changing visibility from external+default to external+hidden,
which allows DCE to just remove the functions.
Claude assisted with this patch.
[M68k] Fix build after removal of RegisterClasses pointer array (#207364)
Commit 4d8ec1968023 ("[CodeGen][NFC] Remove RegisterClasses pointer
array (#207204)") removed regclass_begin()/regclass_end() from
TargetRegisterInfo, so those names now resolve to the MCRegisterInfo
versions whose iterator dereferences to a MCRegisterClass rather than a
const TargetRegisterClass *, breaking getMaximalPhysRegClass():
error: cannot convert 'const llvm::MCRegisterClass' to
'const llvm::TargetRegisterClass*' in initialization
M68k was not updated in that commit. Switch to the range-based
regclasses() idiom used elsewhere in the same change.
Regressor: 4d8ec1968023 ("[CodeGen][NFC] Remove RegisterClasses pointer
array") (#207204)
[AArch64] Minor simplification in aarch64-ldst-opt with an early return (#207182)
Remove the local `MBBIWithRenameReg` by moving an early return at an
even earlier point.
When `MBBIWithRenameReg` is set we always return early. By moving the
early return to `MBBIWithRenameReg` update we get rid of a local
variable which spans 200+ lines. This also fixes a misleading debug
print between `MBBIWithRenameReg` update and early return:
```
LLVM_DEBUG(dbgs() << "Unable to combine these instructions due to "
<< "interference in between, keep looking.\n");
```
This line shouldn't be printed when we set `MBBIWithRenameReg`, which is
fixed with this change.
[X86] haddsub-undef.ll - sync more testnames with their phaseordering equivalents (#207370)
Ensure we have equivalent hadd/sub middle-end test coverage with similar names for lookup
[flang][Driver] Add option for real sum reassociation
Compiler driver option for #207371: -freal-sum-reassociation. This is in
the hidden help for now. Disabled by default.
Assisted-by: Codex
[libc++][ranges] Enable CPO compile tests (#207123)
`adjacent_transform_view` and `stride_view` were implemented but the
test cases were omitted.
Co-authored-by: Hristo Hristov <zingam at outlook.com>