[mlir][IR] Rename `DenseIntOrFPElementsAttr` to `DenseTypedElementsAttr` (#185687)
`DenseIntOrFPElementsAttr` was recently generalized to accept any type
that implement the `DenseElementType` interface. The name
`DenseIntOrFPElementsAttr` does not make sense anymore. This commit
renames the attribute to `DenseTypedElementsAttr`. An alias is kept for
migration purposes. The alias will be removed after some time.
[flang][OpenMP] Use cuf.alloc for privatization of CUDA Fortran device arrays (#185984)
When CUDA Fortran device arrays are listed in an OpenMP private clause,
the compiler previously allocated private copies on the host heap using
fir.allocmem. This caused device-side operations to receive host
pointers instead of device pointers, leading to cudaErrorIllegalAddress
(700).
Fix by detecting symbols with a CUDA data attribute (device, managed,
unified, etc.) during privatization and using cuf.alloc / cuf.free
instead of fir.allocmem / fir.freemem, so the private copies reside in
device memory.
OS-8721 SmartOS builds in Jenkins should guarantee different buildstamps per stage
Reviewed by: Travis Paul <tpaul at edgecast.io>
Approved by: Travis Paul <tpaul at edgecast.io>
[LoopUnroll][NFC] Move unroll pragma helper functions to LoopUnroll.cpp (#185895)
Move loop unroll pragma query helpers (`getUnrollMetadataForLoop`,
`hasUnrollFullPragma`, `hasUnrollEnablePragma`,
`hasRuntimeUnrollDisablePragma`, `unrollCountPragmaValue`) from
`LoopUnrollPass.cpp` and `LoopUnrollAndJamPass.cpp` into
`LoopUnroll.cpp`, and declare them in `UnrollLoop.h`.
These functions were duplicated as `static` helpers in both
`LoopUnrollPass.cpp` and `LoopUnrollAndJamPass.cpp`. Making them
available in `UnrollLoop.h` eliminates the duplication and allows
target-specific code (e.g. TTI implementations) to query unroll
pragma metadata when setting unrolling preferences.
This is in preparation for an upcoming AMDGPU-specific change that
enables `AllowExpensiveTripCount` for pragma-unrolled loops in
AMDGPU's `getUnrollingPreferences()` (#181241), while
discussions on changing the default behavior for all targets continue in
#181267.
[3 lines not shown]
[libc][math] Fixed Hypotbf16 build failure. (#186415)
Ref from the build failure in hypotbf16
```CPP
project/libc/src/__support/math/hypotbf16.h:22:33: required from here
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/src/__support/FPUtil/Hypot.h:221:44: error: conversion from ‘int’ to ‘StorageType’ {aka ‘short unsigned int’} may change value [-Werror=conversion]
221 | r = static_cast<StorageType>((r << 1)) +
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
222 | ((tail_bits & current_bit) ? 1 : 0);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
```
This PR intends to fix that by adding static_cast
[HLSL] Codegen column-major matrix initializer lists without a vector shuffle (#186228)
Fixes #185518
The SPIR-V backend does not handle the lowering of `shufflevector`
instructions on vectors with more than 4 elements.
This PR changes the codegen of matrix init lists to directly emit
vectors with elements in column-major order when the default matrix
memory layout is column-major, as opposed to in linear/row-major order
followed by a vector shuffle.
While an alternative fix could be to change the default depth of
[`canEvaluateShuffled`](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp#L1865-L1866)
to 16 in `InstCombineVectorOps.cpp` to eliminate the vector shuffle for
vectors of up to 16 elements in size (to handle 4x4 matrices), this
change would have broader impacts than just HLSL, which does not seem
necessary for the scope of this issue (which regards only matrix
initializer list codegen).
[6 lines not shown]
[X86] Fix syntax directive for --output-asm-variant=1 (#186316)
`--output-asm-variant=1` in llc/clang correctly prints instructions in
Intel syntax but incorrectly emits `.att_syntax` as the first directive.
Fix this by consulting OutputAsmVariant (with fallback to
AssemblerDialect) when deciding which syntax directive to emit, matching
the same pattern in CodeGenTargetMachineImpl::createMCStreamer
(#109360).
[VPlan] Don't narrow wide loads for scalable VFs when narrowing IGs. (#186181)
For scalable VFs, the narrowed plan processes vscale iterations at once,
so a shared wide load cannot be narrowed to a uniform scalar; bail out,
as there currently is not way to create a narrowed load that loads
vscale elements.
Fixes https://github.com/llvm/llvm-project/issues/185860.
PR: https://github.com/llvm/llvm-project/pull/186181
[ObjC] Support emission of selector stubs calls instead of objc_msgSend. (#186293)
This optimizes objc_msgSend calls by emitting "selector stubs" instead.
Usually, the linker redirects calls to external symbols to a symbol stub
it generates, which loads the target function's address from the GOT and
branches to it:
<symbol stub for _func:>
adrp x16, _func at GOTPAGE
ldr x16, [x16, _func at GOTPAGEOFF]
br x16
with msgSend selector stubs, we extend that to compute the selector as
well:
<selector stub for "foo":>
adrp x1, <selector ref for "foo">@PAGE
ldr x1, [x1, <selector ref for "foo">@PAGEOFF]
[41 lines not shown]
sysutils/podman: Allow setting ownership on auto-created socket
The podman_service daemon auto-creates a socket on startup, along with
parent directory, and is always run as root. It is often useful to have
another proxy like haproxy or nginx provide more sophisticed security,
and these daemons do not need root privileges.
Approved by: dfr
Reported by: pat at patmaddox.com
Tested by: arrowd
Differential Revision: https://reviews.freebsd.org/D55455
XFAIL clang/test/CodeGen/distributed-thin-lto/pass-plugin.ll (#186425)
Failing on AIX as it can't find the new symbol in the exported list.
XFAIL to bring the bots green while we investigate.
Test introduced in: https://github.com/llvm/llvm-project/pull/183525
[mlir] Fix crash in dialect conversion for detached root ops (#185068)
When running dialect conversion with --no-implicit-module, the root op
is
parsed without a wrapping module and then detached from its temporary
parsing
block (block == nullptr). If a conversion pattern replaces this detached
root
op, ReplaceOperationRewrite::commit() would crash with a null pointer
dereference when calling op->getBlock()->getOperations().remove(op).
Fix this with two complementary changes:
1. In ReplaceOperationRewrite::commit(), add a guard that calls
reportFatalInternalError when op->getBlock() is null. This turns the
opaque null-pointer crash into a clear diagnostic pointing at the API
misuse.
2. Make --convert-func-to-spirv explicitly reject detached top-level ops
[9 lines not shown]
[flang][OpenMP] Account for GenericExprWrapper being null (#186416)
When getting a MaybeExpr from parser::Expr, take into account that the
GeneticExprWrapper (that wraps MaybeExpr) may itself be null.