[HLSL][DirectX] Implement HLSL `mul` function and DXIL lowering of `llvm.matrix.multiply` (#184882)
Fixes #99138
- Defines a `__builtin_hlsl_mul` clang builtin in `Builtins.td`.
- Links the `__builtin_hlsl_mul` clang builtin with
`hlsl_alias_intrinsics.h` under the name `mul` for matrix cases
- Implement scalar and vector elementwise multiplication cases of the
`mul` function in `hlsl_intrinsics.h` and `hlsl_intrinsic_helpers.h`
- Adds sema for `__builtin_hlsl_mul` to `CheckBuiltinFunctionCall` in
`SemaHLSL.cpp`
- Adds codegen for `__builtin_hlsl_mul` to `EmitHLSLBuiltinExpr` in
`CGHLSLBuiltins.cpp`
- Vector-vector cases lower to `dot` (except double vectors, which
expands to scalar multiply-adds).
- Matrix-matrix, matrix-vector, and vector-matrix multiplication lower
to the `llvm.matrix.multiply` intrinsic
- Adds codegen tests to `clang/test/CodeGenHLSL/builtins/mul.hlsl`
- Adds sema tests to `clang/test/SemaHLSL/BuiltIns/mul-errors.hlsl`
[13 lines not shown]
[mlir][ODS] Fix notorious double-space bug in op printers (#184253)
When an op's assembly format prints an attribute via
`printStrippedAttrOrType`, two independent space-emission mechanisms
would fire: the op format generator emits a space before each argument,
and the attribute's generated `print` method also emits a leading space
(`shouldEmitSpace` initialized to true). This caused double spaces like
`gpu.shuffle xor`.
The usual workaround for this was to add double backticks to consume the
leading space.
Fixed by removing the leading space from generated attr/type `print()`
methods and compensating in the print dispatcher by conditionally adding
a space between the mnemonic and `print` call when the format starts
with a name or keyword rather than punctuation.
Also remove some workarounds for the double-spacing in op formats and
fix tests that now don't have leading spaces.
Assisted-by: claude
Revert "[llubi] Add support for load/store/lifetime markers" (#185101)
Reverts llvm/llvm-project#182532 to unblock CI.
The original patch causes some test failures related to undef bits, as
it incorrectly assumes `std::uniform_int_distribution` returns the same
result with different C++ stdlib vendors.
[flang] materialize fir.box when it is from a block argument (#184898)
We have to materialize `fir.box` before adding a `fir.convert` to a
memref type. Otherwise we get:
`'fir.convert' op invalid type conversion'!fir.box<!fir.array<?xi32>>' /
'memref<?xi32, strided<[?], offset: ?>>'`
[ASan][Windows] Fixing Windows shadow memory address for arm64 (#184902)
This is a prerequisite for full ARM64 Windows ASan support. The runtime
interception changes needed to make ASan functional end-to-end on ARM64
Windows will be opened separately.
Motivated by https://github.com/microsoft/STL/pull/6095 (more
specifically [this reference to
clang-cl](https://github.com/microsoft/STL/pull/6095#:~:text=Not%20enabling%20GH_002030_asan_annotate_string%20and%20GH_002030_asan_annotate_vector%20yet%20due%20to%20Clang%20issues.))
The latest MSVC toolset includes ARM64 AddressSanitizer support. This
change adds AArch64 to the Windows 64-bit shadow mapping condition when
compiling with `-fsanitize=address` with `clang-cl`. Without this,
consumers on Windows who target ARM64 with `clang-cl -fsanitize=address`
and then link with `link.exe` will see this at runtime:
```text
ERROR: AddressSanitizer: access-violation on unknown address
...
[4 lines not shown]
[WebAssembly] Remove the `wasm-disable-fix-irreducible-control-flow-pass` switch (#185072)
This removes the `wasm-disable-fix-irreducible-control-flow-pass`
switch.
It was originally added in #67715 as a way to avoid the potentially
absurd compile times the pass used to bring. However with the successful
merge of #184441, the pass itself has been fixed to avoid this issue.
Given that, it is no longer necessary nor desirable to keep this switch.
[mlir][LLVM] Add support for `ptrtoaddr`
The `ptrtoaddr` op is akin to `ptrtoint` with some important differences:
* It does not capture the provenance of the pointer, meaning a pointer does not escape and subsequent `inttoptr` don't make a legal pointer. LLVM can then assume the pointer never escaped, which helps alias analysis.
* It does not support arbitrary integer types, but only exactly the integer type that is equal in width to the pointer type as specified by the data layout.
This PR adds the op the MLIR dialect and adds the corresponding verification for the datalayout property.
[mlir][reducer] Add split-input-file to mlir-reduce (#184970)
The tests for mlir-reduce are currently scattered. To centralize the
tests for mlir-reduce, I added the split-input-file feature to
mlir-reduce.It is part of
https://github.com/llvm/llvm-project/pull/184974.
[MC][WebAssembly] Allow strings for import modules and names in asm (#182896)
Current tooling for the WebAssembly component model uses import modules
and names such as `$root` and `[thread-index]`. Importing these from
assembly files requires support for non-valid identifiers in
`.import_name` and `.import_module` directives. This PR adds support for
specifying those as strings, e.g.:
```asm
.import_module __wasm_component_model_builtin_thread_index, "$root"
.import_name __wasm_component_model_builtin_thread_index, "[thread-index]"
```
[clang-doc] Add button toggle for light/dark theme (#181587)
The user can now manually toggle the light or dark theme instead of
waiting for the system theme to change.
Also fixes a typo that caused some overflow issues even when there was
no content to cause an overflow.
[Clang] Fix EmitAggregateCopy assertion for non-trivially-copyable sr… (#185091)
…et types
Fix for buildbot crash on #183639
The UseTemp path in AggExprEmitter::withReturnValueSlot copies back via
EmitAggregateCopy, which asserts that the type has a trivial copy/move
constructor or assignment operator. Gate the DestASMismatch condition on
isTriviallyCopyableType so that non-trivially-copyable types (e.g.
std::exception_ptr) fall through to the addrspacecast path instead.
Fix buildbot crash:
https://lab.llvm.org/buildbot/#/builders/73/builds/19803
[RISCV][P-ext] Only support sshlsat for splat immediate shift amounts. (#184886)
Fixes cannot select errors for other types of shift amounts.
I've made a new RISCVISD node that only allows an immediate operand.
It's assumed that the lowering code will only allow valid immediates so
I'm not using a TImmLeaf in the match.
[flang,acc] Support -ffp-maxmin-behavior option in lowering. (#184730)
This patch adds `flang -fc1` option `-ffp-maxmin-behavior` and
propagates it throughout Flang, so that semantics context,
lowering and the pass pipeline builder can use it.
MAX/MIN intrinsic and OpenACC max/min reduction lowering
are now controlled by the option.
I kept the `Legacy` mode, which is the default and matches the current
behavior. I am going to test and merge a follow-up patch that
replaces `Legacy` with `Portable`.
RFC:
https://discourse.llvm.org/t/flang-canonical-and-optimizable-representation-for-min-max/90037
clang/AMDGPU: Fix workgroup size builtins for nonuniform work group sizes
These were assuming uniform work group sizes. Emit the v4 and v5 sequences
to take the remainder group for the nonuniform case.
Currently the device libs uses this builtin on the legacy ABI path with
the same sequence to calculate the remainder, and fully implements the v5
path. If you perform a franken-build of the library with the updated builtin,
the result is worse. The duplicate sequence does not fully fold out. However,
it does not appear to be wrong. The relevant conformance tests still pass.
[CIR] Fix a crash when source location is unknown (#185059)
When we call `getLoc()` with an invalid `SourceLocation` and
`currSrcLoc` is also invalid, we were crashing or asserting. I tracked
down one case where this was happening (generating an argument in a
vtable thunk) and fixed that to provide a location. I also am updating
the `getLoc()` implementation so that it will use an unknown location in
release builds rather than crashing because the location isn't critical
for correct compilation.
[CIR] Add MLIR ABI Lowering design document
Design document for MLIR dialect-agnostic calling convention
lowering that builds on the LLVM ABI Lowering Library
(llvm/lib/ABI/) as the single source of truth for ABI
classification. Dialects use the library via an adapter layer:
ABITypeMapper maps dialect types to abi::Type*, the library
classifies arguments and returns, and a dialect-specific
ABIRewriteContext applies the decisions back to IR operations.
Targets x86_64 and AArch64, with parity against Classic Clang
CodeGen validated through differential testing.
[clang-doc] Fix benchmark not compiling (#185065)
CI didn't flag that the benchmark was using the outdated Ctx call
when landing the Mustache MD patch since this benchmark isn't tested.
Also added missing libraries in CMake that prevented me from building
the benchmark locally.
[AMDGPU] fix asyncmark soft waitcnt bug (#184851)
Asyncmarks record the current wait state and so should not allow waitcnts that occur after them to be merged into waitcnts that occur before.
[clang][CodeGen] Fix size calculation in vbptr split memory region in EmitNullBaseClassInitialization (#184558)
When splitting memory stores around multiple virtual base pointers
(vbptrs)
in the Microsoft ABI, the calculation for the size of the memory region
after
each vbptr was incorrect.
The bug/old calculation: SplitAfterSize = LastStoreSize -
SplitAfterOffset
This subtracts an absolute offset from a relative size, causing
incorrect (too small) sizes after the second vbptr.
The correct size should be:
SplitAfterSize = (LastStoreOffset + LastStoreSize) - SplitAfterOffset
Since all store regions extend to the end of the non-virtual portion
(NVSize),
this patch uses the simplified form:
[3 lines not shown]