[AMDGPU] Update f16 builtin definitions to use _Float16 instead of __fp16 (#182331)
Change the type signature of 16-bit-insts half-precision builtins from
`__fp16` to `_Float16` in the tablegen builtin definitions.
[mlir][Linalg][Tensor] Preserve attrs on `tensor.pad` when lowering to dst-style (#182064)
When canonicalizing to generic ops within `EliminateEmptyTensors`, we
should take care to preserve the attributes. For example, this attribute
mechanism is employed within IREE's SPIRV pipeline to pass on tiling
configurations together with the ops.
---------
Signed-off-by: Artem Gindinson <gindinson at roofline.ai>
FunctionAttrs: Basic propagation of nofpclass
Perform caller->callee propagation of nofpclass on callsites. As
far as I can tell the only prior callsite to callee propagation here
was for norecurse. This doesn't handle transitive callers.
I was hoping to avoid doing this, and instead get attributor/attributor-light
enabled in the default pass pipeline. nofpclass propagation enabled by
default is the main blocker for eliminating the finite_only_opt global
check in device-libs, but this single level of propagation is most likely
sufficient for that use. Implemnting this here is probably the most expedient
path to removing the control library.
TargetLowering: Replace android triple check with libcall check (#148800)
Instead of directly checking if the target is android, check if
__safestack_pointer_address is available and configure android
to have the call. Maintain the -safestack-use-pointer-address cl::opt
in an unclean way by ignoring libcall availability.
Also add a RuntimeLibcallsInfo entry for __safestack_unsafe_stack_ptr,
similar to other special globals. Also add this unconditionally to most targets,
even though this seems contrary to reality. A few tests rely on unsupported OSes, so
leave that alone for now.
[RISCV] Add RISCVII::getTWidenOpNum. NFC (#182335)
Rewrite get*OpNum helpers in RISCVVSETVLIInfoAnalysis to return the
MachineOperand& which is what the callers really wanted.
Fix profile metadata propagation in InstCombine select folding
Propagate profile metadata when folding select instructions with logical AND/OR conditions and when canonicalizing SPF to intrinsics. This fixes profile verification failures in Transforms/InstCombine/select-and-or.ll.
[RISCV] Separate VMConstraint from RVVConstraint. NFC (#182089)
VMConstraint is true for most vector instructions by default. Almost
every time we set the Vs1/Vs2 bits we had to redundantly set the VM bit.
There were a few cases where the base class had already removed the
default VMConstraint with RVVConstraint=NoConstraint and an
instantiation had to make sure not to set it again when adding Vs1
and/or Vs2 constraints.
By separating them we can manage them more independently.
I will probably rename RVVConstraint in a followup.
[ARM] Treat strictfp vector rounding operations as legal (#180480)
Previously, the strictfp variants of rounding operations (FLOOR, ROUND,
etc) were handled in SelectionDAG via the default expansion, which
splits vector operation into scalar ones. This results in less efficient
code.
This change declares the strictfp counterparts of the vector rounding
operations as legal and modifies existing rules in tablegen descriptions
accordingly.
[BPF] Relax BTF_TYPE_ID_REMOTE_RELOC for unnamed types (#182370)
Currently, BTF_TYPE_ID_REMOTE_RELOC requires a named type e.g. named
struct or union types.
But in [1], there are some use cases where unnamed types, e.g., 'void
*', 'void **', 'const char *', etc. All these will fail compilation with
error:
Empty type name for BTF_TYPE_ID_REMOTE reloc
This patch relaxed this condition to allow unnamed types. The kernel
libbpf will decide what are allowed or not for each specific cases.
[1]
https://lore.kernel.org/bpf/bb4bf5fe648ac71c969c6228ac6e72ea85cbc64b.camel@gmail.com/T/#m5a7abf799b75199f6678eddd9c1ea4e31563b4dc
Make mmap-munmap interceptor fail earlier (#171295)
If the address range is not covered by shadow memory, make interceptors
like mmap fail earlier.
---------
Signed-off-by: Abhishek Varma <abhvarma at amd.com>
Signed-off-by: Minsoo Choo <minsoochoo0122 at proton.me>
Signed-off-by: hanhanW <hanhan0912 at gmail.com>
Signed-off-by: Nikita B <n2h9z4 at gmail.com>
Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
Signed-off-by: Ian Wood <ianwood at u.northwestern.edu>
Co-authored-by: Min-Yih Hsu <min.hsu at sifive.com>
Co-authored-by: Sersawy <65075626+Abdelrhmansersawy at users.noreply.github.com>
Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek at amd.com>
Co-authored-by: Walter Lee <49250218+googlewalt at users.noreply.github.com>
Co-authored-by: cpist (He / Him) <tinyfrog12 at gmail.com>
Co-authored-by: Jonathan Cohen <joncoh at apple.com>
[139 lines not shown]
[NFC][OpenCL] Fix test function-scope-local-return.cl (#182421)
Add `-triple spir64-unknown-unknown` to fix error on arm and aarch64:
unsupported OpenCL extension '__cl_clang_function_scope_local_variables'
[LV] Allow tail folding with IVs with outside users (#182322)
#149042 added last-active-lane and removed the restriction that we
couldn't tail fold loops that had outside users (in AllowedExit).
However we still have a restriction that IVs can't have outside users.
This was added separately to the AllowedExit restriction in #81609, but
it looks like #149042 didn't remove it.
AFAICT we currently extract the correct lane for IVs, so this PR relaxes
the restriction. This helps a good few loops get tail folded in
llvm-test-suite.
-force-tail-folding-style=none was added to pr5881-scev-expansion.ll to
preserve the original scev expansion, since otherwise we end up with a
cttz.elts(false, false, true, true) that blocks SCEV analysis. We should
probably teach ConstantFolding to fold it.
[lldb] Fix batched breakpoint step-over test flakiness (#182415)
PR to fix failing test from
https://github.com/llvm/llvm-project/pull/180101 .
Fix the integration test to be resilient to non-deterministic thread
timing. Instead of requiring exact z0/Z0 counts, verify that batching
reduced toggles compared to one at a time stepping.
Also added: skip on `aarch64` where thread scheduling makes batching
unreliable.
Ran the test 20 times, passed all 20.
Co-authored-by: Bar Soloveychik <barsolo at fb.com>