[libc][math][c++23] Add Fmabf16 math function (#182836)
closes #180171
part of #177259
Here are some extra changes apart from the usual which were needed
1. `libc/src/__support/FPUtil/generic/add_sub.h` → +0 -0 error
2. `libc/src/__support/FPUtil/generic/FMA.h` → implemented to handle
fmabf16(Normal,Normal,+/-INF)
```jsx
/home/runner/work/llvm-project/llvm-project/libc/test/src/math/fmabf16_test.cpp:62: FAILURE
Failed to match __llvm_libc_23_0_0_git::fmabf16(x, y, z) against LIBC_NAMESPACE::testing::mpfr::get_mpfr_matcher<mpfr::Operation::Fma>( input, __llvm_libc_23_0_0_git::fmabf16(x, y, z), 0.5, mpfr::RoundingMode::Nearest).
Input decimal: x: 338953138925153547590470800371487866880.00000000000000000000000000000000000000000000000000 y: 338953138925153547590470800371487866880.00000000000000000000000000000000000000000000000000 z: -inf
First input bits: 0x7F7F = (S: 0, E: 0x00FE, M: 0x007F)
Second input bits: 0x7F7F = (S: 0, E: 0x00FE, M: 0x007F)
Third input bits: (-Infinity)
Libc result: nan
MPFR result: -inf
[16 lines not shown]
[libc] Add posix_memalign as external entrypoint on Linux x86/ARM. (#185310)
`posix_memalign` is provided by Scudo allocator and is a part of POSIX
standard, so we can safely declare it in the `<stdlib.h>` header on
Linux systems.
[CIR][AMDGPU] Add module flags for AMDGPU target (#186081)
Upstreaming clangIR PR: https://github.com/llvm/clangir/pull/2100
This PR adds support to emit AMDGPU-specific module flags
`amdhsa_code_object_version` and `amdgpu_printf_kind` to match OGCG
behavior.
In `CIRGenModule`, the flags are stored as CIR module attributes:
`cir.amdhsa_code_object_version` (integer)
`cir.amdgpu_printf_kind` (string: "hostcall" or "buffered")
During lowering to LLVM IR (in LowerToLLVMIR.cpp), these attributes are
converted to LLVM module flags.
[XeVM] Add translation for XeVM cache-control attributes. (#181856)
Use `llvm.intr.ptr.annotation` to attach cache-control metadata to a
pointer. Each cache-control attribute produces its own annotation call;
multiple attributes are chained so every annotation sits on the same
pointer.
This approach protects the metadata across optimizations.
[offload] Remove LIBOMPTARGET_SHARED_MEMORY_SIZE envar (#186231)
This commit removes the `LIBOMPTARGET_SHARED_MEMORY_SIZE` envar and
outputs a runtime warning if it is defined. Access to dynamic shared memory
should be obtained through the `dyn_groupprivate` clause (OpenMP 6.1) or
the launch arguments in liboffload kernel launch.
Exclude known failure case (#186305)
External resources does not produce same result on big-endian. Keeping
this test for regressions of the encoding scoped keeps it simple while
it doesn't affect the usage there. So just mark as XFAIL.
Exclude known failure case
External resources does not produce same result on big-endian. Keeping this test for regressions of the encoding scoped keeps it simple while it doesn't affect the usage there. So just mark as XFAIL.
[RISCV] Fix crash in getShuffleCost for P-extension without V extension (#186149)
RISCVTTIImpl::getShuffleCost() crashes when querying the cost of a
reverse shufflevector on a target with the P-extension but without V/Zve
extensions. The SK_Reverse case calls
getContainerForFixedLengthVector(), which asserts hasVInstructions().
The P-extension uses fixed-width packed SIMD in GPRs, not RVV registers,
so V extension is typically not enabled.
Add an early return for P-extension fixed vectors in getShuffleCost,
consistent with the existing guards in getScalarizationOverhead,
getCastInstrCost, and getVectorInstrCost.
[RISCV] Fix crash in combinePExtTruncate for truncate(srl) without MUL/SUB (#186141)
combinePExtTruncate is called from performTRUNCATECombine when the
P-extension is enabled. It attempts to match patterns like
truncate(srl(mul/sub(...), shamt)) and combine them into P-extension
narrowing shift instructions (e.g. PNSRLI, PNSRAI).
However, after extracting the shift input operand `Op` from the SRL
node, the function unconditionally accessed Op.getOperand(0) and
Op.getOperand(1) without first verifying that Op has at least two
operands. For example, when combining:
```
truncate(v2i16
srl(v2i32
bitcast(v2i32 i64), <-- Op = bitcast, a unary op with 1 operand
BUILD_VECTOR <8, 8>))
```
[7 lines not shown]
[MLIR][XeGPU] Enhance Layout Propagation for broadcasting both leading dimensions and inner unit dimensions (#185583)
This PR enhances the layout propagation rules for broadcast operations.
The source layout is derived from the result layout based on the
broadcast pattern:
1. Broadcast on leading dimensions
The source layout is the slice layout of the result layout.
2. Broadcast on inner unit dimensions
The source layout matches the result layout, with sg_data and lane_data
set to 1.
3. Broadcast on both leading dimensions and inner unit dimensions
The source layout is derived by combining the above two rules.