[HLSL][DirectX][SPIRV] Implement the `fma` API (#185304)
This PR adds `fma` HLSL intrinsic (with support for matrices)
It follows all of the steps from #99117.
Closes #99117.
[msan] Disambiguate "Strict" vs. "Heuristic" when dumping instructions (#188873)
When -msan-dump-strict-instructions and
-msan-dump-heuristic-instructions are simultaneously enabled, it is
unclear from the output whether each instruction is strictly vs.
heuristically handled. [*] This patch fixes the issue by tagging the
output.
The actual instrumentation of the code is unaffected by this change.
[*] A workaround is to compile the code once with only
-msan-dump-strict-instructions, and a second time with
-msan-dump-heuristic-instructions, but this unnecessarily doubles the
compilation time.
[DA] Refactor signature of weakCrossingSIVtest and check inputs (NFCI) (#187117)
Passing SCEVAddRecExpr objects directly to weakCrossingSIVtest and
checking the validity of the input operands
[libc] Remove header templates from several C standard headers. (#188878)
Switches the following headers to hdrgen-produced ones by referencing
some macro from C standard and the file containing the declarations in
corresponding YAML files:
* limits.h (referenced _WIDTH / _MAX / _MIN families).
* locale.h (referenced LC_ family).
* time.h (referenced CLOCKS_PER_SEC).
* wchar.h (referenced WEOF).
[DTLTO] Improve performance of adding files to the link (#186366)
The in-process ThinLTO backend typically generates object files in
memory and adds them directly to the link, except when the ThinLTO cache
is in use. DTLTO is unusual in that it adds files to the link from disk
in all cases.
When the ThinLTO cache is not in use, ThinLTO adds files via an
`AddStreamFn` callback provided by the linker, which ultimately appends
to a `SmallVector` in LLD. When the cache is in use, the linker supplies
an `AddBufferFn` callback that adds files more efficiently (by moving
`MemoryBuffer` ownership).
This patch adds a mandatory `AddBufferFn` to the DTLTO ThinLTO backend.
The backend uses this to add files to the link more efficiently.
Additionally:
- Move AddStream from CGThinBackend to InProcessThinBackend, for reader
clarity.
- Modify linker comments that implied the AddBuffer path is
[12 lines not shown]
[RISCV][NFC] Use enum types to improve debuggability (#188418)
So that we can see the enum values instead of integral values when
dumping in debuggers.
[libc][docs] Document libc-shared-tests ninja target (#189062)
Added a brief description of the libc-shared-tests target to the
Building and Testing page.
This target allows running tests for shared standalone components like
math primitives without the full libc runtime.
[Clang] [Sema] Don't diagnose multidimensional subscript operators on dependent types (#188910)
I forgot to check for dependent types in #187828; we somehow didn’t have
tests for this so CI didn’t catch this...
[MLIR][SCF] Fix loopUnrollByFactor for unsigned loops with narrow integer types (#189001)
`loopUnrollByFactor` used `getConstantIntValue()` to read loop bounds,
which sign-extends the constant to `int64_t`. For unsigned `scf.for`
loops with narrow integer types (e.g. i1, i2, i3), this produces wrong
results: a bound such as `1 : i1` has `getSExtValue() == -1` but should
be treated as `1` (unsigned).
Two bugs were introduced by this:
1. **Wrong epilogue detection**: the comparison `upperBoundUnrolledCst <
ubCst` used signed int64, so e.g. `0 < -1` (where ubCst is the
sign-extended i1 value 1) evaluated to false, suppressing the epilogue
that should execute the remaining iterations.
2. **Zero step after overflow**: when `tripCountEvenMultiple == 0` (all
iterations go to the epilogue), `stepUnrolledCst = stepCst *
unrollFactor` can overflow the bound type's bitwidth and wrap to 0. A
zero step causes `constantTripCount` to return `nullopt`, preventing the
[11 lines not shown]
[MLIR][XeGPU] Extend convert_layout op to support scalar type (#188874)
This PR adds scalar type to convert_layout op's result and operand. It
also enhance convert_layout pattern in wg-to-sg, unrolling, and
sg-to-lane distribution.
It is to support reduction to scalar, whether currently the layout
propagation doesn't support scalar to carry any layout. The design
choice to insert convert_layout op after reduction-to-scalar op to
record the layout information permanently across the passes.
[libc][docs] Fix POSIX basedefs links for nested headers (#188738)
Fix broken POSIX basedefs links for nested headers in llvm-libc docs.
The docgen script currently emits paths like `sys/wait.h.html`, but the
Open Group uses `sys_wait.h.html`, for example:
-
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sys_wait.h.html
This updates nested-header link generation while leaving flat headers
unchanged.
[lldb] Make single-argument Address constructor explicit (NFC) (#189035)
This is to highlight places where we (probably unintentionally)
construct an `Address` object from an already resolved address, making
it unresolved again.
See the changes in `DynamicLoaderDarwin.cpp` for a quick example.
Also, use this constructor instead of `Address(lldb::addr_t file_addr,
const SectionList *section_list)` when `section_list` is `nullptr`.
[mlir][vector] Add support for dropping inner unit dims for transfer_read/write with masks. (#188841)
The revision clears a long-due TODO, which supports the lowering when
transfer_read/write ops have mask via inserting a vector.shape_cast op
for the masked value.
---------
Signed-off-by: hanhanW <hanhan0912 at gmail.com>
draft
Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
add tests for parallel_for
Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
remove operators from index space classes
Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
[SLP][NFC] Reapply "Refactor to prepare for constant stride stores" (#188689)
Refactor to proceed #185964.
Much of this is a refactor to address this issues. Instead of iterating over one chain at a time, attempting all VFs for that given change, we now iterate over VFs, trying each chain for the current VF.
Includes fix for use after free bug.
[compiler-rt] Fix irrelevant warning on the builtins target (#189055)
Summary:
Currently, building through runtimes will yield this warning:
```
CMake Warning at compiler-rt/cmake/Modules/CompilerRTUtils.cmake:335 (message):
LLVMTestingSupport not found in LLVM_AVAILABLE_LIBS
Call Stack (most recent call first)
```
This is due to the fact that the builtins target does not go through the
s tandard runtimes patch and sets them as BUILDTREE_ONLY so they do not
show up. These are not used in this case, so just guard the condition to
suppress the warning.
[Offload][NFC] Various minor changes to Offload CMake (#189029)
Summary:
Most of these just remove some redundancy or rename `openmp` ->
`offload` where the variable is purely internal.
Produce back-references for anonymous namespaces (#188843)
The Microsoft mangle implementation does not produce back-references for
anonymous namespaces, which results in nonsensical output from both
`undname` and `llvm-undname`. Consider the following example:
```
namespace {
struct X {};
X foo(X, X);
}
int main() {
foo({}, {});
}
```
Clang 22.1.0
```
[30 lines not shown]
[Clang][OpenMP][NFC] Fix status color mismatches in OpenMPSupport.rst (#189050)
Correct the colors used in the OpenMP support tables so they
consistently match their status text:
- :good: (green) is for 'done' only
- :part: (yellow) is for in-progress states ('partial', 'worked on', 'in
progress', 'prototyped', etc.)
- :none: (red) is for 'unclaimed' only
Assisted with copilot
[libc][bazel] Add generation for public headers (#184889)
Previously there was a single rule for stdbit, this PR adds generated
header targets for the rest of the linux headers. It also adds a
cc_library
for all of the public headers which also includes the types and macros
headers.
[MLIR][XeVM] Wrap in-place op modifications in modifyOpInPlace in LLVMLoadStoreToOCLPattern (#188952)
LLVMLoadStoreToOCLPattern::matchAndRewrite was calling op->removeAttr()
and op->setOperand() directly without going through the rewriter API.
This caused MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS to report "expected
pattern to replace the root operation or modify it in place".
Fix: wrap the direct mutations in rewriter.modifyOpInPlace().
Assisted-by: Claude Code
Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.