[clang][deps] Add module map describing compiled module to file dependencies. (#160226)
When we add the module map describing the compiled module to the command
line, add it to the file dependencies as well.
Discovered while working on reproducers where a command line input was
missing in the captured files as it wasn't considered a dependency.
[RISCV] Only reduce VLs of instructions with demanded VLs (#168693)
In RISCVVLOptimizer we first compute all the demanded VLs, then we walk
backwards through the function and try to reduce any VLs.
We don't actually need to walk backwards anymore since after #124530 the
order in which we modify the instructions doesn't matter.
This patch changes it to just iterate over the instructions with a
demanded VL computed, which means we don't iterate over scalar
instructions etc.
This also fixes #168665, where we triggered an assert on instructions
with a dead $vxsat implicit-def:
dead %x:vr = PseudoVSADDU_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */,
0 /* tu, mu */, implicit-def dead $vxsat
Because $vxsat is a reserved register, DeadMachineInstructionElim won't
[6 lines not shown]
VectorCombine: Improve the insert/extract fold in the narrowing case
Keeping the extracted element in a natural position in the narrowed
vector has two beneficial effects:
1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which
allows the insert/extract fold to trigger.
2. It makes the narrowing shuffles in a chain of extract/insert
compatible, which allows foldLengthChangingShuffles to successfully
recognize a chain that can be folded.
There are minor X86 test changes that look reasonable to me. The IR
change for AVX2 in llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
doesn't change the assembly generated by `llc -mtriple=x86_64-- -mattr=AVX2`
at all.
commit-id:c151bb04
AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles
These shuffles can always be implemented using v_perm_b32, and so this
rewrites the analysis from the perspective of "how many v_perm_b32s does
it take to assemble each register of the result?"
The test changes in Transforms/SLPVectorizer/reduction.ll are
reasonable: VI (gfx8) has native f16 math, but not packed math.
commit-id:8b76e888
VectorCombine/AMDGPU: Cleanup a test and add a new one
The existing, recently added test contains a whole lot of noise in the
form of dead instructions. Also, prefer named values.
The new test isolates a separate issue with concatenating i8 vectors.
commit-id:0b9965b7
[libc++][memory] Applied `[[nodiscard]]` to smart pointers (#168483)
Applied `[[nodiscard]]` where relevant to smart pointers and related
functions.
- [x] - `std::unique_ptr`
- [x] - `std::shared_ptr`
- [x] - `std::weak_ptr`
See guidelines:
-
https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant
- `[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue. For example a locking
constructor in unique_lock.
---------
Co-authored-by: Hristo Hristov <zingam at outlook.com>
[llvm-dwp] Give more information when incompatible version found (#168511)
Provide more information when detecting a DWARF version mismatch in .dwo
files to help locate the issue and align with other similar errors.
[clang-doc] Remove unused headers
Removes unused headers or replaces them with headers that directly
provide the symbol instead. For example, `Serialize.h` included `AST.h`,
but it was actually `Serialize.cpp` that needed concept expressions, so
now it includes just `ExprConcepts.h`.
[AMDGPU] Allow hazard checks for WMMA co-exec
Now we are just inserting V_NOP instrtuctions, try to schedule
something into the shadow.
It is still somewhat imprecise, for example AdvanceCycle() will
use TII.getNumWaitStates() anyway, but in a scheduling mode
we are not required to be precise. We must be finally precise
in the hazard recognizer mode. Then EmittedInstrs buffer is also
limited to MaxLookAhead even though VALU only hazards may actually
never expire and require an endless buffer. But that's OK, we can
at least mitigate what the buffer can hold. The buffer is also
currently much bigger than any of VALU hazards may need.
That said the rest of the 'fix*' functions here can be changed
the same way, these which are using V_NOPs. This one is just the
worst because it may require up to 9 nops.
[ClangLinkerWrapper] Refactor target ID sanitization for Windows file… (#168744)
… names
Fix non-RDC mode HIP compilation for the new driver on Windows due to
invalid temporary file names when offload arch is a target ID containing
':', which is invalid in file names on Windows.
Refactor the existing handling of ':' in file names on Windows from
clang driver into a shared function sanitizeTargetIDInFileName in
clang/Basic/TargetID.h. This function replaces ':' with '@' on Windows
only, preserving the original behavior.
Update both clang/lib/Driver/Driver.cpp and
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp to use this
shared function, ensuring consistent handling across both tools.
[tosa] : Relax dynamic dimension checks for batch for conv decompositions (#168764)
This PR relaxes the validation checks to allow input/output data to have
dynamic batch dimensions.
[SLP]Fix insertion point for setting for the nodes
The problem with the many def-use chain problems in SLP vectorizer are
related to the fact that some nodes reuse the same instruction as
insertion point. Insertion point is not the instruction, but the place
between instructions. To set it correctly, better to generate pseudo
instruction immediately after the last instruction, and use it as
insertion point. It resolves the issues in most cases.
Fixes #168512 #168576
[AMDGPU] Refactor hazard recognizer for VALU-pipeline hazards. NFCI.
This is in preparation of handling these in scheduler. I do not expect
any changes to the produce code here, it is just an infrastructure.
Our current problem with the VALU pipeline hazards is that we only
insert V_NOP instructions in the hazard recognizer mode, but ignore
it during scheduling. This patch is meant to create a mechanism to
actually account for that during scheduling.
[Arm64EC][clang] Implement varargs support in clang. (#152411)
The clang side of the calling convention code for arm64 vs. arm64ec is
close enough that this isn't really noticeable in most cases, but the
rule for choosing whether to pass a struct directly or indirectly is
significantly different.
(Adapted from my old patch https://reviews.llvm.org/D125419 .)
Fixes #89615.