[AMDGPU][GFX12.5] Add support for emitting memory operations with nv bit set (#179413)
- Add `MONonVolatile` MachineMemOperand flag.
- Set nv=1 on memory operations on GFX12.5 if the operation accesses a
constant address space,
is an invariant load, or has the `MONonVolatile` flag set.
InstCombine: Only propagate callsite attributes in sqrt->sqrtf
This was propagating the callee's attributes instead of just the
callsite. It's illegal to set denormal_fpenv on a callsite. This
was also losing callsite attributes which may have been more useful;
there's no point in setting the callee's attributes on the callsite.
Adding support for G_STRICT_FMA in new reg bank select (#170330)
This patch adds legalization rules for G_STRICT_FMA opcode.
---------
Co-authored-by: Abhinav Garg <abhigarg at amd.com>
[AArch64][GloballISel] Put result of fp16 -> s16 convert intrinsic on fpr
Previously, RegBankSelect would place the result of an fp16 -> s16 conversion intrinsic on a gpr. This would cause Instruction Selection to fail, as there are no 16-bit gprs.
Example floating point convert intrinsics:
fcvtnu / fcvtns
fcvtau / fcvtas
fcvtzu / fcvtzs
AMDGPU/GlobalISel: Regbanklegalize rules for G_FREEZE (#179796)
Move G_FREEZE handling to AMDGPURegBankLegalizeRules.cpp.
Added support for uniform S1.
lldb-dap: Stop using replicated variable ids (#124232)
Closes #119784
Probably closes #147105 as well, but I couldn't test due to #156473:
This PR fixes two bugs:
1. It generates unique variable reference IDs per suspended debuggee
state.
2. It stores all created variables in a stopped state instead of
dropping variables in unselected scopes. So it can properly handle all
scope/variable requests
It does this by storing all variables in their respective scopes and
using that mapping in request handlers that relied on the old mapping.
It dynamically creates new variable/scope IDs instead of resetting IDs
whenever a new scope is created.
I also removed some unused code as well.
[7 lines not shown]
[AArch64][SME] Add missing ZT0 transition (#179193)
This transition was missed off the switch, but is already supported (see
the test for the expected behavior).
(cherry picked from commit c7dd96e6f29b032a4879a7fe2fb0ff2ee1406aa5)
[VPlan] Create edge mask for single-destination switch (#179107)
When converting phis to blends, the `VPPredicator` expects to have edge
masks to the phi node if the phi node has different incoming blocks.
This was not the case if the predecessor of the phi was a switch where a
conditional destination was the same as the default destination.
This was because when creating edge masks in `createSwitchEdgeMasks`,
edge masks are set in a loop through the *non-default* destinations. But
when there are no non-default destinations (but at least one condition,
otherwise an earlier condition would trigger and just forward the source
mask), this loop is never executed, so the masks are never set.
To resolve this, we explicitly forward the source mask for these cases
as well, which is correct because it is an unconditional branch, just a
very convoluted one.
fixes #179074
(cherry picked from commit 3bbf748a63a3cb38271a478b520789be57d5e2c8)
[lldb] Remove --debug/-d option from lldb (#179978)
The functionality was removed in
d3173f4ab61c17337908eb7df3f1c515ddcd428c after being broken for a long
time.
There's a small risk someone is still passing the option, but I think
it's time to remove it and they can fix their scripts.
[flang] Use alias analysis in lowering record assignments (#180010)
Without alias analysis Flang assumes no aliasing in lowering record
assignments which can result in miscompilation of programs using
SEQUENCE types and EQUIVALENCE.
Use alias analysis to guard the fast path in `genRecordAssignment`;
otherwise fall back to element-wise expansion.
Update FIR FileCheck expectations
Add `FIRAnalysis` to `"flang/unittests/Optimizer/CMakeLists.txt"` to fix
the Windows x64 build failure (linker error).
Add `SEQUENCE` handling and update tests accordingly.
Fixes #175246 (and includes the fix to
flang/lib/Optimizer/Builder/CMakeLists.txt in PR #176483).
Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski at hpe.com>
[libc++] Short-cut constraints of single-argument `any` constructor (#177082)
When a default template argument of a function template uses
`std::is_copy_constructible<T>::value` and `T` is convertible from and
to `any`, the changes in 21dc73f6a46cd786394f10f5aef46ec4a2d26175 would
introduce constraint meta-recursion when compiling with Clang.
This patch short-cuts constraints of the related constructor to avoid
computing `is_copy_constructible<T>` when `decay_t<T>` is `any`, which
gets rid of constraint meta-recursion in the overload resolution of copy
construction of `T`.
Fixes #176877.
(cherry picked from commit aa5428864e86f8e38806fc92d14cadc68b3d0667)
[DAGCombiner] Fix exact power-of-two signed division for large integers (#177340)
Previously, the DAG combiner did not optimize exact signed division by a
power-of-two constant divisor for integer types exceeding the size of
division supported by the target architecture (e.g., i128 on x86-64).
However, such an optimization was expected by the division expansion
logic, leading to unsupported division operations making it to
instruction selection.
This commit addresses this issue by making an exception to the existing
exclusion of signed division with the exact flag for the aforementioned
operations. That is, the DAG combiner will now optimize exact signed
division if the divisor is a power-of-two constant and the integer type
exceeds the size of division supported by the target architecture.
---------
Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen at amd.com>