[lldb] Upstream arm64e support in ValueObject (#186906)
In #186001, I said the last large chunk of downstream PtrAuth code in
LLDB was the expression evaluator support. However, that wasn't
accurate, as we also have changes to thread this through ValueObject.
[flang][OpenMP][CUDA] Place privatized device allocatable descriptors in managed memory (#187114)
When an OpenMP private clause privatizes a CUDA Fortran allocatable
device array, the Fortran descriptor for the private copy must be
accessible from both the host and the GPU. Without this change, the
descriptor lives on the host stack (via the OpenMP runtime's
CreateAlloca), which a CUF kernel running on the GPU cannot
dereference—resulting in cudaErrorIllegalAddress.
This patch modifies the omp.private init/dealloc region generation in
PrivateReductionUtils.cpp with three changes:
1. Allocate the descriptor in managed memory
2. Set allocator_idx = 2 on the null fir.embox
3. Free the managed descriptor
Source example:
```
real(8), device, allocatable :: adev(:)
[47 lines not shown]
[NFC] Update `LoopVectorize/predicator.ll` test
Align it with the style of `LoopVectorize/VPlan/predicator.ll`:
* Move ascii-graphs close to IR to avoid scrolling through CHECKs when
comparing the pictuare and actual IR
* Rename `%cN` to ensure that `bbN` branches on `%cN`
[AMDGPU][GlobalISel] Switch tests to new reg-bank-select and refresh checks (#186506)
Update AMDGPU GlobalISel tests to use -new-reg-bank-select. These tests
can be updated due to the existing implementation of legalization rules
for G_TRUNC.
[CIR] Fix missing RegionBranchTerminatorOpInterface declarations (#187112)
After https://github.com/llvm/llvm-project/pull/186832 operations with
RegionBranchTerminatorOpInterface needs to declare
`getMutableSuccessorOperands`.
[AMDGPU] fold a call to implictarg.ptr to a poison with no-implicitarg-ptr (#186925)
When a caller function with `amdgpu-no-implicitarg-ptr` calls
`llvm.amdgcn.implicitarg.ptr`, a poison value is returned.
[clang][Driver][SPIRV] Fix assertion when using -emit-llvm (#186824)
In the failing case we are in the link phase with `-emit-llvm` passed,
which means we are going to call `llvm-link` so all inputs are expected
to be `.bc` files, and linker options aren't supported as we aren't
calling a real linker.
I can't imagine anyone wants to pass arguments to `llvm-link`. Just drop
them and warn instead of asserting.
Closes: https://github.com/llvm/llvm-project/issues/186598
Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
[mlir][llvmir] Fix crash when a CallSiteLoc has a UnknownLoc callee (#186860)
Avoids reading a null StringAttr when no file name is present by
manufacturing a default instead.
[CIR] Fix missing RegionBranchTerminatorOpInterface declarations
After https://github.com/llvm/llvm-project/pull/186832 operations with RegionBranchTerminatorOpInterface needs to declare `getMutableSuccessorOperands`.
[libc]: implement 'iswpunct' entrypoint (#186968)
Added entrypoints:
- baremetal/arm
- baremetal/aarch64
- baremetal/riscv
- darwin/aarch64
- linux/aarch64
- linux/arm
- linux/riscv
- linux/x86_64
- windows
Also added the unit test for iswpunct.
Part of the issue: #185136
[flang][OpenMP] Remove unused function declaration, NFC (#187101)
The function `GetNumGeneratedNestsFrom` has been removed, but repeated
local rebases stubbornly inserted the declaration back in.
[clang-format] Fix Macros configuration not working with try/catch expansions (#184891)
This is a superseding followup to my previous PR,
https://github.com/llvm/llvm-project/pull/183352.
In my previous PR, I proposed adding TryMacros and CatchMacros
configuration options, similar in spirit to IfMacros and ForEachMacros.
I did so because I noticed that configuration like
`Macros=["TRY_MACRO=try", "CATCH_MACRO(e)=catch(e)]` did not format
configured macro(s) as try/catch blocks. @owenca confirmed in my
previous PR that this observed behavior is undesired, and we should
prefer to fix it rather than introduce new features.
This PR proposes a fix, described in detail in the commit message below
the break. In general terms, it deletes a heuristic from the lexing
phase, where it interacted poorly with the Macros option, and moves its
functionality to the parsing phase instead.
I describe a possibly cleaner fix in [a comment
[34 lines not shown]
[AMDGPU] Add structural stall heuristic to scheduling strategies
Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.
- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
- Resource conflicts on unbuffered resources (from the SchedModel)
- Sequence-dependent hazards (from GCNHazardRecognizer)
- Add getHazardWaitStates() to GCNHazardRecognizer that returns the number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
[SLP][NFC] Refactor BinOpSameOpcodeHelper BIT enum (#187067)
More readable syntax and increase type width to avoid silent errors if
we reach 17 members.
[mlir][GPU] Set nsw/nuw when expanding out subgroup ID (#187099)
There's no world where the subgroup ID (or the intermediate values
needed to compute it) will be negative or will have signed overflow.
This commit adds flags accordingly, which is helpful as this is a rather
low-level rewrite that might run after the analyses that would
ordinarily add these flags.
[flang][OpenMP] Remove unused function declaration, NFC
The function `GetNumGeneratedNestsFrom` has been removed, but repeated
local rebases stubbornly inserted the declaration back in.