[AMDGPU] Unmark wave reduce intrinsics for constant folding
The `add`, `sub`, and `xor` wave reduction intrinsics cannot
be constant folded, as `add` and `sub` need to be multipled
by the number of active lanes, and `xor` depends on the parity
of the number of active lanes.
[AMDGPU] Unmark wave reduce intrinsics for constant folding
The `add`, `sub`, and `xor` wave reduction intrinsics cannot
be constant folded, as `add` and `sub` need to be multipled
by the number of active lanes, and `xor` depends on the parity
of the number of active lanes.
[ELF] Parallelize input file loading (#191690)
During `createFiles`, `addFile()` records a `LoadJob` for each
non-script input (archive, relocatable, DSO, bitcode, binary) with a
state-machine snapshot (`inWholeArchive`, `inLib`, `asNeeded`,
`withLOption`, `groupId`) and expands them on worker threads in
`loadFiles()`. Linker scripts are still processed inline since their
`INPUT()` and `GROUP()` commands recursively call `addFile()`.
Outside `createFiles()`, `loadFiles()` is called with a single job and
drained immediately (`deferLoad` is false). Two cases:
- `addDependentLibrary()`: `.deplibs` sections trigger `addFile()`
during the serial `doParseFiles()` loop.
- `--just-symbols`: pushes files directly, bypassing
`addFile`/`LoadJob`.
Thread-safety:
- A mutex serializes `BitcodeFile` / fatLTO constructors that call
`ctx.saver` / `ctx.uniqueSaver`. Zero contention on pure ELF links.
[23 lines not shown]
Revert "[AMDGPU] Fixed verifier crash because of multiple live range components." (#193135)
Reverts llvm/llvm-project#190719
The Buildbot has detected a new failure on builder
sanitizer-aarch64-linux-bootstrap-hwasan while building llvm.
[MLIR][XeGPU] Recover temporary layout from Anchor Layout (#191947)
This PR refactor the recoverTemporaryLayout() method so that the
temporary layout is recovered from anchor layout, not from any user
specified temporary layout.
[NFC] [clangd] [C++20] [Modules] Introduce ProjectModules::getModuleNameState interface (#193133)
A hole in the current design is that, we assumed there is no duplicated
module name in different module interface in the same project.
This is not true techniquelly. ISO disallows duplicated module names in
a linked program. But we can have multiple program in a project. It will
be fine if they are not linked together. And in practice, it will be
fine if the symbols are masked and if these module interface units are
not showing in the same context of a single translation unit.
I am trying to improve this. This patch tries to add some NFC things to
reduce further patch size.
AI assisted.
[NVPTX] Add commutativity to SETP instructions to enable MachineCSE of inverted predicates
Inverted predicates can be used freely in PTX. If we can invert a
predicate and CSE the generating instruction we can save calculating
the inverse.
Teach the NVPTX commuteInstructionImpl that SETP instructions can be
inverted to allow CSEing with previous SETP that match the inverted
form. This also inverts the branch users of the predicate to maintain
correctness.
Currently only allow the SETP inversion if all users are branches.
Future work can extend this to sel and not instructions.
Made-with: Cursor
[clang-tidy][NFC] Fix list.rst and improve alias detection of `add_new_check.py` (#192228)
Follow up of https://github.com/llvm/llvm-project/pull/192224.
This commit does two things:
- Replace the original alias detection based on `:http-equiv` (we may
remove these completely in the future) with a method of directly
matching the documentation section.
- Update the list.rst
---------
Co-authored-by: Victor Chernyakin <chernyakin.victor.j at outlook.com>
[NFC] [clangd] [C++20] [Modules] Rename and move scanningProjectModules (#193128)
I am going to add more stuff to ProjectModules and the current structure
and the file name scanningProjectModules may be confusing.
This NFC patch changes that.
[AMDGPU] Fixed verifier crash because of multiple live range components. (#190719)
In Rewrite AGPR-Copy-MFMA pass, after replacing spill instructions, the
replacement register may have multiple live range components when the
spill slot was stored to more than once. The verifier crashes with a bad
machine code error. This patch fixes the problem by splitting a live
range but assigning the same physical register in this scenario. A new
test has been added that verifies the absence of this verifier error.
Assisted-by: Claude Opus
[BOLT] Fix stream position before appendPadding in writeEHFrameHeader
When writeEHFrameHeader needs to allocate new space for .eh_frame_hdr
(because the old section is too small), it calls appendPadding to align
NextAvailableAddress. appendPadding writes zero bytes at the current
stream position, but after the section write loop in rewriteFile the
stream is positioned at the end of the last section written in
BinarySection::operator< order — not at the file offset corresponding
to NextAvailableAddress.
In the common case (single loadObject call) the write order matches file
offset order, so the stream happens to be in the right place. But when
a runtime library adds sections via additional loadObject calls, the
operator< iteration order (code-before-data) can diverge from file
offset order: a runtime library code section may have a higher file
offset than a runtime library data section that comes after it in the
write loop. The stream then ends at a lower offset than expected, and
appendPadding's zeros overwrite the beginning of the code section.
Fix by seeking to the correct file offset before calling appendPadding.
[test][LowerTypeTests] Re-generate jump table tests with --check-globals (#192734)
Debug information will be updated in the
https://github.com/llvm/llvm-project/pull/192736,
so we want to track the difference.