InstCombine: Handle canonicalize in SimplifyDemandedFPClass
Doesn't try to handle PositiveZero flushing mode, but I
don't believe it is incorrect with it.
[LLVM][ADT] Make `scope-exit` CTAD-capable (#173131)
This enables using it like
```cpp
llvm::scope_exit Cleanup([] { ... });
```
instead of
```cpp
auto Cleanup = llvm::make_scope_exit([] { ... });
```
[MLIR][NVPTX] Add intrinsics and Ops to read smem-sizes (#173089)
This patch adds three intrinsics and their corresponding Ops
representing the PTX special-register read instructions
that report various configurations of shared-memory sizes.
Signed-off-by: Durgadoss R <durgadossr at nvidia.com>
InstCombine: Handle canonicalize in SimplifyDemandedFPClass
Doesn't try to handle PositiveZero flushing mode, but I
don't believe it is incorrect with it.
ValueTracking: Improve accuracy of 0 handling with PreserveSign (#173165)
If the source value is known not subnormal and not zero with the
same sign, we can infer the result is also not zero with the same
sign.
ValueTracking: Improve accuracy of 0 handling with PreserveSign
If the source value is known not subnormal and not zero with the
same sign, we can infer the result is also not zero with the same
sign.
[libc++] Implement `adjacent_transform` (#168208)
This patch implements std::ranges::adjacent_transform_view. This is part
of P2321R2 tracked
by #105169.
[InstCombine] Bail out on type mismatch in foldICmpBinOpWithConstantViaTruthTable (#173179)
Fixes https://github.com/llvm/llvm-project/issues/173177
The previous implementation doesn't consider cases like `<2 x i1>
icmp(binop(sel <2 x i1>, sel i1))`.
[RISCV] Use legally typed splat during vmv_v_v splat(x) -> vmv_v_x (#173154)
Fixes https://github.com/llvm/llvm-project/issues/173141
Introduced in #170539, `DAG.getSplatValue` may involve the illegal-typed
splat value if not specified. This patch fixes it.
Fix use-after-free bug in mergeTwoFunctions(). (#173126)
This was caught by Apple's Probabilistic Guard Malloc which detected
that OldF's memory is freed inside mergeTwoFunctions(), and then
back in insert() the now dangling pointer is dereferenced again.
rdar://163874208
[RISCV] Introduce new AND combine to expose additional load narrowing opportunities (#170483)
The standard codegen pipeline sometimes ends up with a shift followed by
a mask. If doing the mask first would have enabled load narrowing, then
it is preferable to do so. The motivating example was seen in povray
from SPEC where we had something like:
```
lh a0, 0(a0)
slli a0, a0, 56
srli a0, a0, 52
```
Which can be better implemented as:
```
lbu a0, 0(a0)
slli a0, a0, 4
```
[3 lines not shown]
[MLIR][Python] Update the scf.if interface to be consistent with affine.if (#173171)
This is a follow-up of #171957 that updates the argument names of
`scf.if` Python binding to be consistent with `affine.if`. Basically,
both operations should use `has_else` to determine whether the `if`
block is presented.
cc @makslevental
[clang] Add FixItHint for designated init order (#173136)
Generate fix-it for C++20 designated initializers when the initializers
do not match the declaration order in the structure.
[Clang][SYCL][NFC] Modify err_sycl_entry_point_invalid to use %enum_select. (#173122)
The `err_sycl_entry_point_invalid` diagnostic has a selection field for
which there are already many options with more expected to be added. Use
of `%enum_select` avoids the need for magic numbers with associated
comments at source locations where the diagnostic is issued.
[WIP][IR][Constants] Change the semantic of `ConstantPointerNull` to represent an actual `nullptr` instead of a zero-value pointer
The value of a `nullptr` is not always `0`. For example, on AMDGPU, the `nullptr` in address spaces 3 and 5 is `0xffffffff`. Currently, there is no target-independent way to get this information, making it difficult and error-prone to handle null pointers in target-agnostic code.
We do have `ConstantPointerNull`, but it might be a little confusing and misleading. It represents a pointer with an all-zero value rather than necessarily a real `nullptr`. Therefore, to represent a real `nullptr` in address space `N`, we need to use `addrspacecast ptr null to ptr addrspace(N)` and it can't be folded.
In this PR, we change the semantic of `ConstantPointerNull` to represent an actual `nullptr` instead of a zero-value pointer. Here is the detailed changes.
* `ptr addrspace(N) null` will represent the actual `nullptr` in address space `N`.
* `ptr addrspace(N) zeroinitializer` will represent a zero-value pointer in address space `N`.
* `Constant::getNullValue` will return a _null_ value. It is same as the current semantics except for the `PointerType`, which will return a real `nullptr` pointer.
* `Constant::getZeroValue` will return a zero value constant. It is completely same as the current semantics. To represent a zero-value pointer, a `ConstantExpr` will be used (effectively `inttoptr i8 0 to ptr addrspace(N)`).
* Correspondingly, there will be both `Constant::isNullValue` and `Constant::isZeroValue`.
The RFC is https://discourse.llvm.org/t/rfc-introduce-sentinel-pointer-value-to-datalayout/85265. It is a little bit old and the title might look different, but everything eventually converges to this change. An early attempt can be found in https://github.com/llvm/llvm-project/pull/131557, which has many valuable discussion as well.
This PR is still WIP but any early feedback is welcome. I'll include as many necessary code changes as possible in this PR, but eventually this needs to be carefully split into multiple PRs, and I'll do it after the changes look good to every one.
[RISCV] Support Xqcilo loads/stores in RISCVMakeCompressible (#172971)
This patch adds support for converting Xqcilo loads/stores with either
large offsets or uncompressible registers into loads/stores that can be
compressed. We do this transformation only when the Xqcilia extension is
enabled in addition to the Xqcilo extension so that we can use the
QC_E_ADDI instruction to form the new base.
There might be a few cases where compressing from the 48-bit Xqcilo
load/store to a 32-bit load/store might be beneficial which this patch
does not address.
Revert "Make STLExtras's (all|any|none)_of() Utility Functions Constexpr-Friendly" (#173163)
Reverts llvm/llvm-project#172536. This is causing weird assertion
failures in clang, per
https://github.com/llvm/llvm-project/pull/172536#issuecomment-3677973154.
It might be a bug in GCC, but still makes sense to revert it in the
interest of bootstrapping.
---------
Signed-off-by: Michał Górny <mgorny at gentoo.org>
[CIR] Only emit FP math intrinsics when precision/errno settings allow it (#169424)
Depending on the compiler CLI options, attributes near the call site and
pragmas we might not be allowed to emit a call to an intrinsic (e.g. if
it does not set errno and we expect it to be set). This is checked by
`shouldGenerateFPMathIntrinsic` (shared with classing codegen).
This commit adds this check and additionally adds remaining cases in the
switch statement for math builtins.
[AMDGPU] Propagate alias information in AMDGPULowerKernelArguments. (#161375)
Emit `!noalias` and `alias.scope` metadata for `noalias` kernel
arguments.
---------
Co-authored-by: Leon Clark <leoclark at amd.com>
Revert "[VPlan] Use predicate from VPValue VPWidenSelectR::computeCost." (#173170)
Reverts llvm/llvm-project#172915
Looks like this may be causing
https://lab.llvm.org/buildbot/#/builders/128/builds/9590 to fail.
Revert while I confirm.
[HLSL][Matrix] Support row and column indexing modes for MatrixSubscriptExpr (#171564)
fixes #167617
In DXC HLSL supports different indexing modes via codegen for its
equivalent of the MatrixSubscriptExpr when the /Zpr and /Zpc flags are
used see: https://godbolt.org/z/bz5Y5WG36.
This change modifies EmitMatrixSubscriptExpr to consider the
MatrixRowMajor/MatrixColMajor Layout flags before generating an index.
Similarly it introduces `createRowMajorIndex` and
`createColumnMajorIndex` in `MatrixBuilder.h` for use in
`VisitMatrixSubscriptExpr`.