[AsmPrinter][MTE] Support memtag-globals for all AArch64 targets (#187065)
This change ensures that all AArch64 targets can use memtag globals, not
only Android.
[AArch64] Fix register scavenger crash when merging MTE stack tags (#186934)
When `-sanitize=memtag-stack` is enabled, `TagStoreEdit::emitLoop`
optimizes contiguous ST2Gi instructions into an STGloop. Because this
runs during PEI (post-register allocation), it spawns two new virtual
registers: BaseReg and SizeReg.
Under high register pressure (e.g., Swift async continuation thunks
where almost all registers are kept live), the Register Scavenger must
rely on emergency spill slots to assign physical registers to BaseReg
and SizeReg.
Previously, the compiler assumed at most one emergency spill slot was
needed. If PEI found an unused Callee-Saved Register (`ExtraCSSpill`),
it bypassed allocating an emergency slot entirely. If no CSRs were free,
it allocated exactly one slot. Because STGloop requires TWO scratch
locations, the scavenger would crash trying to fulfill the second
allocation.
[11 lines not shown]
libclc: Really implement denormal config checks
These should be implementable by checking the behavior of
the canonicalize intrinsic. Hack around spirv still failing
on canonicalize by overriding and assuming DAZ for float.
[HLSL][DXIL][SPIRV] QuadReadAcrossX intrinsic support (#184360)
This PR adds QuadReadAcrossX intrinsic support in HLSL with codegen for
both DirectX and SPIRV backends. Resolves
https://github.com/llvm/llvm-project/issues/99175.
- [x] Implement QuadReadAcrossX clang builtin
- [x] Link QuadReadAcrossX clang builtin with hlsl_intrinsics.h
- [x] Add sema checks for QuadReadAcrossX to
CheckHLSLBuiltinFunctionCall in SemaChecking.cpp
- [x] Add codegen for QuadReadAcrossX to EmitHLSLBuiltinExpr in
CGBuiltin.cpp
- [x] Add codegen tests to
clang/test/CodeGenHLSL/builtins/QuadReadAcrossX.hlsl
- [x] Add sema tests to
clang/test/SemaHLSL/BuiltIns/QuadReadAcrossX-errors.hlsl
- [x] Create the int_dx_QuadReadAcrossX intrinsic in
IntrinsicsDirectX.td
- [x] Create the DXILOpMapping of int_dx_QuadReadAcrossX to 123 in
[8 lines not shown]
[InstCombine] Fix comment in SimplifyDemandedUseBits (NFC) (#187126)
Fix the values in the truth table comment for the combine
add iN (sext i1 X), (sext i1 Y) --> sext (X | Y) to iN
[lldb-dap] Improve support for variables with anonymous fields and types (#186482)
While looking at the '[raw]' value of a std::vector I noticed we didn't
handle the anonymous inner struct very well. The 'evaluateName' was
incorrect (e.g. the evaluateName would return `<var>.` for the anonymous
struct).
This improves support for variables with anonymous fields and anonymous
types.
* Changed the name of anonymous fields from `<null>` to `(anonymous)`,
which matches other tooling like clangd's representation and how types
are presented if the field is not defined.
* Adjusts variables to not return an 'evaluateName' for anonymous
fields.
* Adjusted '[raw]' values to be marked as 'internal' which deemphasizes
them in the UI.
While working in this area, I also consolidated some helpers that are
[10 lines not shown]
[clang] fix crash related to missing source locations for converted template arguments
This adds a way to attach source locations to trivially created template
arguments such as packs, or converted expressions when there is no
expression anymore.
This also avoids crashes due to missing source locations.
In a few places where this matters, we already create expressions
from the converted arguments, but this requires access to Sema,
where currently creating trivial typelocs only requires access to
to the ASTContext.
So this creates a new storage kind for TemplateArgumentLocs, where
a single SourceLocation is stored, embedded in the pointer where
possible.
As a drive-by, strenghten asserts by enforcing the TemplateArgumentLocs
are created with the right kinds of locations.
[2 lines not shown]
[VPlan] Account for early-exit dispatch blocks when updating LI. (#185618)
Now that we can vectorize loops with multiple early exits, we emit
dispatch blocks after the middle block to go to a specific exit or
continue in the dispatch chain.
With that, we need to be a bit more careful when it comes to picking the
loop the dispatch block belongs to. The dispatch block will belong to
the innermost loop of all exit blocks reachable from the current block.
Fixes https://github.com/llvm/llvm-project/issues/185362
PR: https://github.com/llvm/llvm-project/pull/185618
[CIR] Upstream ThreeWayCmpOp (#169963)
This PR upstreams the three way compare op from the incubator repo
---------
Co-authored-by: Hendrik Hübner <hhuebner at Hendriks-MacBook-Pro.local>
[mlir][GPU] Bump static bound on cluster IDs (#187106)
Hardware (like AMD's gfx1250) allows 16 workgroups per cluster, but the
static bound of 8 from many years ago hasn't been updated. This commit
adds such an update and adds a test for that bound.
[lldb][PrefixMap] follow up fixes to #187145 (#187337)
Fix and improve #187145 for following issues:
* Fix unhandled error.
* Align the log type with the file where it contains.
* The added test doesn't work on windows host for remote debugging, add
decorator to skip when host and target do not match.
[NVPTX] Split Param address space into EntryParam and DeviceParam (NFC) (#186636)
This change begins clarifying and cleaning up some oddities around the
param address-space in NVPTX. PTX supports ".param" loads and stores
referring to both entry (kernel) and device parameters, however these
spaces are actually quite different. Entry param space supports
pointers, and addrspace-casting to generic while device parameter space
can only be refrenced by a parameter plus an immediate offset. This
change accounts for this fact with the following refactors:
- Rename `ADDRESS_SPACE_PARAM` -> `ADDRESS_SPACE_ENTRY_PARAM`. This
reflects the fact that only entry parameter space can be meaningfully
modeled in LLVM IR and that pointers with this AS in llvm IR are always
referring to entry parameters.
- Add `NVPTX::AddressSpace::DeviceParam` for NVPTX MIR instructions.
This is used in NVPTX MIR instructions to signify that they load/store
device parameters. It has a distinct value from
`NVPTX::AddressSpace::EntryParam` so that in the future we can print
these differently on supported PTX versions.
Move the call frame edges log messages to the verbose channel. (#187324)
The messages about searching for call edges can be really verbose and
they are only useful if you are explicitly debugging the call edges
feature. Most of the time they are irrelevant and just make the step log
output hard to read.
[flang] Fix fir.call setCalleeFromCallable (#187124)
The CallOpInterface setCalleeFromCallable allows either value or
SymbolRef to be passed in. However, the implementation showed an issue
because while it was able to set attribute, it would fall-through and
also try to set value.
This PR improves the implementation to handle updating the callee even
when switching modes (direct vs indirect) and adds testing for these
APIs.
[clang-doc] Enclose documented entities in a card (#185121)
This patch adds a card that encompasses the whole documented entity
instead of just the description. This helps to visually separate the
documentation which was previously more difficult to distinguish. The
description card is also changed to only show a left border to create
less visual noise within the card.
The light theme colors are also changed slightly to not be completely
white.
[llvm-remarkutil] filter: Add --exclude flag (#187163)
Add --exclude to invert filter behavior, keeping all remarks excluding
those matching the filter.
Pull Request: https://github.com/llvm/llvm-project/pull/187163
libclc: Invert subnormal checks
The base case is correct denormal handling, not flushing. This
also matches the spec controls, which starts at IEEE and
flushing is enabled with -cl-denorms-are-zero.
Also fix wrong defaults for half and double. Denormal support is
not optional for these.
[flang][acc] Handle deduplicated use_device (part 2) (#187305)
After https://github.com/llvm/llvm-project/pull/186855 there was still
one additional part of the pass that assumed it was able to erase
acc.use_device. Thus extend the same solution and add test.