[DWARFLinker] Patch DW_AT_LLVM_stmt_sequence in the parallel linker (#195388)
Mirror dsymutil's stmt-sequence rewriting in the parallel linker so each
attribute ends up pointing at the DW_LNE_set_address that opens its
containing output sequence, with the correct offset in the combined
.debug_line.
At DIE cloning time we resolve each attribute's input offset to the
address of its first row and record the pair (DIEValue, address) on the
CompileUnit, alongside a DebugOffsetPatch on the .debug_info section so
combination adds the CU's .debug_line start offset. The line-table
emitter then fills a map from row address to the byte offset of the
sequence-opening DW_LNE_set_address.
After emission, each recorded attribute is rewritten by relocating its
input address through the CU's function ranges and looking the result up
in the map. When resolution fails the DWARF max-offset sentinel is
written instead, and the patch applier preserves it unchanged.
First-row lookups share a lazy per-CU cache to keep resolution O(1) per
attribute.
Reland [Inliner] Use store-to-load forwarding to resolve call arguments (#195526)
Adds store to load forwarding when inliner has successfully done some
inlining. This allows simplification of further inlining attempts and
can give them more precise cost analysis.
It allows to optimize away empty `std::set` and `std::map` in both
`libc++` and `libstdc++` and many other real world cases.
Reland of #190607. It was reverted since it was causing crashes in
#195135. These were crashes in `FindAvailableLoadedValue` on mixed
address space pointers and should be fixed by #195256
[HLSL] For builtins aliases, apply implicit conversions before running custom type checking (#195365)
Fixes https://github.com/llvm/llvm-project/issues/195329 by making HLSL
builtin aliases apply implicit conversions before running custom type
checking.
After this PR:
- There are no more size 1 vectors being passed and returned to/from
aliased Clang builtins because they get truncated to scalars due to the
HLSL alias builtin not having explicit size 1 vector overloads.
- HLSL alias builtins no longer accept matrices unless they have
explicit matrix overloads. Matrices get implicitly truncated to scalars
and resolve to the scalar Clang builtin being aliased.
- Many calls with mismatched vector sizes no longer error with
`arguments are of different types` and instead follow Clang's overload
resolution rules with respect to HLSL's implicit conversion sequences.
(e.g., `dot(float3, float2)` -> `dot(float2, float2)` with warning)
- Calls with implicitly-convertible types no longer error. They are now
implicitly converted, and with a warning in some cases. (e.g.,
[3 lines not shown]
[VPlan] Scalarize to first-lane-only directly on VPlan (#184267)
This is needed to enable subsequent
https://github.com/llvm/llvm-project/pull/182595.
I don't think we can fully port all scalarization logic from the legacy
path to VPlan-based right now because by that point in the pipeline
interleave groups aren't lowered into any VPlan-based representation and
as such this pass operates on incomplete information. Currently, the
pass can make transformations if "all uses are scalar" (that won't
change later) but not "are uses a mix of vector and scalar uses" (that
might change after lowering interleave groups).
As such, I decided just to implement something much simpler that would
be enough for #182595. However, we perform this transformation before
delegating to the old CM-based decision, so it **is** effective
immediately and taking precedence even for consecutive loads/stores
right away.
[2 lines not shown]
[DWARFLinker] Add assembly-label range handling to parallel linker (#195366)
Assembly CUs typically have DW_TAG_label entries instead of subprograms,
so the parallel linker's line-table filter saw no function ranges and
dropped every row. Mirror the classic linker: for labels in
Mips_Assembler or Assembly CUs, look up an assembly range via
getAssemblyRangeForAddress and call addFunctionRange before falling back
to addLabelLowPc.
[Flang][OpenMP] Fix assert trigger in MapInfoFinalization pass for implicit record member maps (#193851)
The current iteration of the implicit record member mapping segment of
the MapInfoFinalization pass makes the assumption that child maps of
parents are already bound to the targets block arguments, but that is
not the case apon initial lowering from PFT to MLIR. This actually
happens as the end of the MapInfoFinalization pass currently where we
"canonicalize" that all maps are inserted as Block arguments to their
respective targets.
This assumption unfortunately leads to a few cases where we trigger the
assertion, to address this we can impose this canonicalization of map
<-> block arguments as soon as we enter the pass and then once again at
the end of the pass for any new members generated by the
MapInfoFinalization pass. This allows the implicit record member mapping
process to continue unhindered whilst changing very little elsewhere
other than the ordering of block arguments (hence some lit tests
tweaks). The main downside is the extra processing required for running
the "canonialization" twice.
[4 lines not shown]
[AArch64][GlobalISel] Lower unmerge to extract_subvector (#195046)
This follows and reuses the existing lowering for unmerge -> extract
vector element, extending it to also lower unmerge -> subvector extract
for half-sized vector extracts. This allows certain tablegen patterns to
match.
An extra extract_subvector(dup) combine is needed to optimize away
unnecessary instructions. The ext vs mov/dup brings us in-line with
SDAG, but we may change both to use mov/dup.
[NFCI] clarify that asan-*linux.cpp files affect *nix OS'es (#195565)
**Prior Work:** Aims to supersede (#132263), which seems inactive,
specifically by applying my own comment:
https://github.com/llvm/llvm-project/pull/132263#issuecomment-3051238734
**Context:** It aims to minimally document that the
`asan_(malloc_)?linux.cpp` files may impact non-linux OS'es (despite the
name) such as Solaris, BSD, and other *nix OS'es. This is worth
documenting as otherwise we risk breakage due to confusion, as occurred
[here](https://github.com/llvm/llvm-project/pull/131975#issuecomment-2741097471).
This is done simply by minimally augmenting the file header comment
saying precisely this.
Unlike the prior PR, this does not rename any files, which should reduce
the 'git noise' impact of this change.
_Thanks!_
[InstCombine] Remove redundant assume fold (#195852)
The fold is fully redundant with the fold using `computeKnownBits`, so
we can let that do the work instead.
[lldb][windows] fix cross DLL file descriptor lookup crash (#195855)
On Windows, file descriptors are only valid in the same DLL: they are
really just handles mapped to an index in a table in the CRT. Calling a
liblldb method with a file descriptor from lldb-dap will cause the
program to crash. See
https://github.com/llvm/llvm-project/issues/193971.
This patch fixes the issue by refactoring the `NativeFile` constructors
so that they no longer try to convert `FILE` types to handles through
the CRT lookup table.
PeepholeOpt: Clear kill flags in foldImmediate (#195680)
When foldImmediate replaces a COPY destination with its source,
this extends the live range of the source, but it does not update the
kill flags.
Clear kill flags on the source register after replacement.
This was found while working on REG_SEQUENCE optimizations motivated by
AMDGPU demands. Both an AMDGPU and an X86 test case are added to show that
the issue is not AMDGPU specific.
[mlir][vector] Account for subview offset in gather lowering. (#195359)
Strided vector.gather on a column subview was reading the wrong column
because the rewrite to a collapsed gather dropped the subview's static
offset.
---------
Signed-off-by: hanhanW <hanhan0912 at gmail.com>
[AArch64][llvm] Tighten SYSP; don't disassemble invalid encodings
Tighten SYSP aliases, so that invalid encodings are disassembled
to `<unknown>`. This is because:
```
Cn is a 4-bit unsigned immediate, in the range 8 to 9
Cm is a 4-bit unsigned immediate, in the range 0 to 7
op1 is a 3-bit unsigned immediate, in the range 0 to 6
op2 is a 3-bit unsigned immediate, in the range 0 to 7
```
Ensure we check this when disassembling, and also constrain
tablegen for compile-time errors of invalid encodings.
Also adjust the testcases in `armv9-sysp-diagnostics.s` and
`llvm/test/MC/AArch64/armv9a-sysp.s` as they were invalid,
and added a few invalid (outside of range) SYSP-alikes to
test that `<unknown>` is printed