[SPIRV] Add support for emitting DebugFunction debug info instructions
This commit adds support for emitting SPIRV DebugFunction and
DebugFunctionDefinition instructions for function definitions.
[mlir][tosa] Enhance TosaInferShapes pass for simple shape inference (#178418)
This commit enhances the TosaInferShapes pass with two new options:
- fold-shape-expressions
- convert-function-boundaries
The "fold-shape-expressions" option enables greedily folding the newly
added TOSA shape operations when possible. Folding these operations
directly within TosaInferShapes is useful since it allows shapes of
later operations to be inferred in a single pass.
The "convert-function-boundaries" updates the return types of a function
to the newly inferred output shapes. This avoids the need for additional
tensor.cast operations at function boundaries. This option is
particularly useful when wanting to resolve a dynamic function to fully
static.
When both of these options are used in conjunction with the
"tosa-input-shapes" pass option, it's possible to resolve a dynamic
[6 lines not shown]
[NFC][VPlan] Add initial tests for future VPlan-based stride MV
I tried to include both the features that current
LoopAccessAnalysis-based transformation supports (e.g., trunc/sext of
stride) but also cases where the current implementation behaves poorly,
e.g., https://godbolt.org/z/h31c3zKxK; as well as some other potentially
interesting scenarios I could imagine.
The are two test files with the same content. One is for VPlan dump change of
the future transformation alone (I'll update `-vplan-print-after` in the next
PR), another is for the full vectorizer pipeline. The latter have two `RUN:`
lines:
* No multiversioning, so the next PR diff can show the transformation itself
* Stride multiversionin performed in LAA, so that we can compare future
VPlan-based transformation vs old behavior.
[flang][NFC] Converted five tests from old lowering to new lowering (part 21) (#183224)
Tests converted from test/Lower: host-associated-globals.f90,
identical-block-merge-disable.f90, implicit-call-mismatch.f90,
implicit-interface.f90, integer-operations.f90
[SPIRV] Refactor NonSemantic debug info placement logic.
Refactor the logic for determining which NonSemantic.Shader.DebugInfo.100
instructions should be placed in the global section from a whitelist
to a blacklist approach.
security/(modsecurity3|modsecurity-nginx) : switch to PCRE2 and fix NGINX version
Change PCRE to PCRE2.
Update NGINX version to 1.28.2.
PR: 293279
Sponsored by: Netzkommune GmbH
[clang] [NFC] Improve move-assign and move-constructor for NestedNameSpecifierLocBuilder (#180484)
This avoids a deep copy of the manually managed underlying Buffer.
This is a follow-up to #180482.
[libc] Add atan2l implementation fallback to atan2f128. (#182587)
Add implementation for `atan2l` that falls back to `atan2f128` if
float128 support is available.
We do this for now in lieu of 80-bit-specific implementation. Going
forward, we should be choosing 64-bit, 80-bit, or 128-bit specific
implementation based on the specific "long double"
implementation. Also, once llvm-libc will have its own software
implementation of float128, depending on host type presence
would not be needed.
`atan2l` is one of the remaining dependencies needed for building libc++
against llvm-libc (it's used in `<complex>` header).
[InstCombine] Fold min/max of two subtracts with common RHS (#183240)
Fold: minmax(sub X, Z , sub Y, Z) -> sub minmax(X, Y), Z
When both sub instructions have no-wrap flags and share the same RHS
operand, we can fold:
smin (sub nsw X, Z), (sub nsw Y, Z) -> sub nsw (smin X, Y), Z
smax (sub nsw X, Z), (sub nsw Y, Z) -> sub nsw (smax X, Y), Z
umin (sub nuw X, Z), (sub nuw Y, Z) -> sub nuw (umin X, Y), Z
umax (sub nuw X, Z), (sub nuw Y, Z) -> sub nuw (umax X, Y), Z
This is valid because subtraction by a common value preserves relative
ordering when no signed/unsigned overflow occurs.
Proof: https://alive2.llvm.org/ce/z/n9gwj2
Closes https://github.com/llvm/llvm-project/issues/167059
[NFC][VPlan] Split `makeMemOpWideningDecisions` into subpasses
The idea is to have handling of strided memory operations (either from
https://github.com/llvm/llvm-project/pull/147297 or for VPlan-based
multiversioning for unit-strided accesses) done after some mandatory
processing has been performed (e.g., some types **must** be scalarized)
but before legacy CM's decision to widen (gather/scatter) or scalarize
has been committed.
And in longer term, we can uplift all other memory widening decision to
be done here directly at VPlan level. I expect this structure would also
be beneficial for that.
mlx5: report IPSEC offload capabilities whenever IPSEC_OFFLOAD is configured
Do it always for bootverbose if offload was enabled in the kernel
config, not only if the device actually supports all required
capabilities to do the offload. Otherwise, having the code to print the
caps is pointless.
Reviewed by: slavash
Tested by: Wafa Hamzah <wafah at nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
netipsec/ipsec_offload.c: handle failures to install SA nicely
If driver refused to install SA, record rejected handle for SA on the
interface always, not only for EOPNOTSUPP case. The
ipsec_accel_output() function did the right thing if there is no
rejection handle, but not having the handle allows further attempts to
install the SA on the interface.
If driver installed the SA, but ipsec_accel_handle_sav() returned error,
uninstall the SA from the interface. Hardware must not be set up to
process packets for which kernel expects no processing is done.
In both cases, free the drv_spi if a handle was not installed. But keep
drv_spi allocated if the deinstall returned an error from the driver.
Reviewed by: slavash
Tested by: Wafa Hamzah <wafah at nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
[mlir][OpenMP] Introduce 'omp.iterators' for OpenMP iterator modifiers (#182218)
`omp.iterator` provides information of induction variables and iterator
range in OpenMP iterator modifier.
Example:
```
%it = omp.iterator(%i0: index, %i1: index) =
(%lb0 to %ub0 step %st0,
%lb1 to %ub1 step %st1) {
omp.yield(%i0, %i1 : index, index)
} -> !omp.iterated<!llvm.struct<(!llvm.ptr, i64)>>
```
Here's how we can use the omp.iterater to generate multi-dimensional
loop in llvm ir:
```
// Induction variables can be translated from the block arguments
// in omp.iterator.
[12 lines not shown]
[Hexagon] Handle trunc to i1 in matchRightShift (#174737)
Fix of test regression seen when working on
https://github.com/llvm/llvm-project/issues/172888
this will handle "trunc(x) to i1" as "icmp_ne(and(x,1),0)"
updates matchRightShift to match this pattern and promoteTo to map the
trunc to "icmp_ne(and(x,1),0)"
[RISCV] Make ElementsDependOn opt-in instead of opt-out. NFCI (#181601)
RISCVVectorPeephole and RISCVVLOptimizer use the ElementsDependOn field
to know if it's safe to change the VL of a vector instruction.
By default instructions are EltDepsNone, i.e.
RISCVVectorPeephole::tryReduceVL will reduce its VL by default, but we
might forget to mark unsafe instructions in newer extensions. This patch
changes the default to EltDepsVLMask and instead explicitly marks any
instructions which want to have their VL reduced.
There is an assert in RISCVVLOptimizer::isCandidate that ensures that
all previously isSupported instructions are still marked correctly.