LLVM/project 06f7cd4clang/lib/Headers/hlsl hlsl_intrinsics.h hlsl_alias_intrinsics.h, clang/lib/Sema SemaHLSL.cpp

[HLSL][DirectX] Implement HLSL `mul` function and DXIL lowering of `llvm.matrix.multiply` (#184882)

Fixes #99138

- Defines a `__builtin_hlsl_mul` clang builtin in `Builtins.td`.
- Links the `__builtin_hlsl_mul` clang builtin with
`hlsl_alias_intrinsics.h` under the name `mul` for matrix cases
- Implement scalar and vector elementwise multiplication cases of the
`mul` function in `hlsl_intrinsics.h` and `hlsl_intrinsic_helpers.h`
- Adds sema for `__builtin_hlsl_mul` to `CheckBuiltinFunctionCall` in
`SemaHLSL.cpp`
- Adds codegen for `__builtin_hlsl_mul` to `EmitHLSLBuiltinExpr` in
`CGHLSLBuiltins.cpp`
- Vector-vector cases lower to `dot` (except double vectors, which
expands to scalar multiply-adds).
- Matrix-matrix, matrix-vector, and vector-matrix multiplication lower
to the `llvm.matrix.multiply` intrinsic
- Adds codegen tests to `clang/test/CodeGenHLSL/builtins/mul.hlsl`
- Adds sema tests to `clang/test/SemaHLSL/BuiltIns/mul-errors.hlsl`

    [13 lines not shown]
DeltaFile
+336-0llvm/test/CodeGen/DirectX/matrix-multiply.ll
+109-0clang/test/CodeGenHLSL/builtins/mul.hlsl
+95-0llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp
+68-0clang/lib/Headers/hlsl/hlsl_intrinsics.h
+49-0clang/lib/Headers/hlsl/hlsl_alias_intrinsics.h
+43-0clang/lib/Sema/SemaHLSL.cpp
+700-04 files not shown
+795-010 files

LLVM/project fef16a7mlir/test/Dialect/GPU shuffle-rewrite.mlir, mlir/test/Dialect/Vector vector-warp-distribute.mlir

[mlir][ODS] Fix notorious double-space bug in op printers (#184253)

When an op's assembly format prints an attribute via
`printStrippedAttrOrType`, two independent space-emission mechanisms
would fire: the op format generator emits a space before each argument,
and the attribute's generated `print` method also emits a leading space
(`shouldEmitSpace` initialized to true). This caused double spaces like
`gpu.shuffle xor`.

The usual workaround for this was to add double backticks to consume the
leading space.

Fixed by removing the leading space from generated attr/type `print()`
methods and compensating in the print dispatcher by conditionally adding
a space between the mnemonic and `print` call when the format starts
with a name or keyword rather than punctuation.

Also remove some workarounds for the double-spacing in op formats and
fix tests that now don't have leading spaces.

Assisted-by: claude
DeltaFile
+52-4mlir/tools/mlir-tblgen/AttrOrTypeDefGen.cpp
+20-20mlir/test/Dialect/XeGPU/sg-to-wi-experimental-unit.mlir
+17-17mlir/test/Dialect/Vector/vector-warp-distribute.mlir
+4-5mlir/test/mlir-tblgen/attr-or-type-format.td
+5-3mlir/tools/mlir-tblgen/AttrOrTypeFormatGen.cpp
+4-4mlir/test/Dialect/GPU/shuffle-rewrite.mlir
+102-538 files not shown
+115-6614 files

LLVM/project bc55e5ellvm/utils/gn/secondary/clang/include/clang/Basic BUILD.gn, llvm/utils/gn/secondary/clang/lib/Basic BUILD.gn

[gn] port 3da28bfbce4d7a (DiagnosticStableIDs)
DeltaFile
+10-5llvm/utils/gn/secondary/clang/include/clang/Basic/BUILD.gn
+1-0llvm/utils/gn/secondary/clang/lib/Basic/BUILD.gn
+11-52 files

LLVM/project 949db45llvm/lib/Target/AMDGPU SILoadStoreOptimizer.cpp

cap the promotion to u16
DeltaFile
+9-2llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
+9-21 files

LLVM/project 2c8c39allvm/test/tools/llubi loadstore_le.ll loadstore_be.ll, llvm/tools/llubi/lib Context.cpp Interpreter.cpp

Revert "[llubi] Add support for load/store/lifetime markers" (#185101)

Reverts llvm/llvm-project#182532 to unblock CI.
The original patch causes some test failures related to undef bits, as
it incorrectly assumes `std::uniform_int_distribution` returns the same
result with different C++ stdlib vendors.
DeltaFile
+32-303llvm/tools/llubi/lib/Context.cpp
+0-192llvm/test/tools/llubi/loadstore_le.ll
+0-190llvm/test/tools/llubi/loadstore_be.ll
+8-127llvm/tools/llubi/lib/Interpreter.cpp
+17-46llvm/tools/llubi/lib/Value.h
+4-42llvm/tools/llubi/lib/Context.h
+61-9009 files not shown
+65-1,04815 files

LLVM/project 97cf8bfflang/lib/Optimizer/Transforms FIRToMemRef.cpp, flang/test/Transforms/FIRToMemRef array-coor-block-arg.mlir

[flang] materialize fir.box when it is from a block argument (#184898)

We have to materialize `fir.box` before adding a `fir.convert` to a
memref type. Otherwise we get:
`'fir.convert' op invalid type conversion'!fir.box<!fir.array<?xi32>>' /
'memref<?xi32, strided<[?], offset: ?>>'`
DeltaFile
+19-8flang/lib/Optimizer/Transforms/FIRToMemRef.cpp
+18-0flang/test/Transforms/FIRToMemRef/array-coor-block-arg.mlir
+37-82 files

LLVM/project a870547llvm/lib/Target/AMDGPU SILoadStoreOptimizer.cpp, llvm/test/CodeGen/AMDGPU memintrinsic-unroll.ll promote-constOffset-to-imm-gfx12.mir

use smallest offset as anchor when negative offset is not allowed
DeltaFile
+99-171llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
+61-22llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.mir
+67-11llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
+34-40llvm/test/CodeGen/AMDGPU/fold-gep-offset.ll
+5-4llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.ll
+266-2485 files

LLVM/project 2927c97llvm/lib/Target/AMDGPU SILoadStoreOptimizer.cpp, llvm/test/CodeGen/AMDGPU promote-constOffset-to-imm-gfx12.mir promote-constOffset-to-imm-gfx12.ll

[AMDGPU] Disable negative imm offset for async load/store instructions
DeltaFile
+20-16llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.mir
+6-7llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.ll
+3-1llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
+29-243 files

LLVM/project c172000llvm/lib/Transforms/Instrumentation AddressSanitizer.cpp, llvm/test/Instrumentation/AddressSanitizer basic-msvc64.ll

[ASan][Windows] Fixing Windows shadow memory address for arm64 (#184902)

This is a prerequisite for full ARM64 Windows ASan support. The runtime
interception changes needed to make ASan functional end-to-end on ARM64
Windows will be opened separately.

Motivated by https://github.com/microsoft/STL/pull/6095 (more
specifically [this reference to
clang-cl](https://github.com/microsoft/STL/pull/6095#:~:text=Not%20enabling%20GH_002030_asan_annotate_string%20and%20GH_002030_asan_annotate_vector%20yet%20due%20to%20Clang%20issues.))

The latest MSVC toolset includes ARM64 AddressSanitizer support. This
change adds AArch64 to the Windows 64-bit shadow mapping condition when
compiling with `-fsanitize=address` with `clang-cl`. Without this,
consumers on Windows who target ARM64 with `clang-cl -fsanitize=address`
and then link with `link.exe` will see this at runtime:

```text
ERROR: AddressSanitizer: access-violation on unknown address
...

    [4 lines not shown]
DeltaFile
+2-2llvm/test/Instrumentation/AddressSanitizer/basic-msvc64.ll
+1-1llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
+3-32 files

LLVM/project b87cf50llvm/lib/Target/WebAssembly WebAssemblyTargetMachine.cpp

[WebAssembly] Remove the `wasm-disable-fix-irreducible-control-flow-pass` switch (#185072)

This removes the `wasm-disable-fix-irreducible-control-flow-pass`
switch.

It was originally added in #67715 as a way to avoid the potentially
absurd compile times the pass used to bring. However with the successful
merge of #184441, the pass itself has been fixed to avoid this issue.

Given that, it is no longer necessary nor desirable to keep this switch.
DeltaFile
+1-8llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
+1-81 files

LLVM/project 0b6cd1amlir/include/mlir/Dialect/LLVMIR LLVMOps.td, mlir/lib/Dialect/LLVMIR/IR LLVMDialect.cpp

[mlir][LLVM] Add support for `ptrtoaddr`

The `ptrtoaddr` op is akin to `ptrtoint` with some important differences:
* It does not capture the provenance of the pointer, meaning a pointer does not escape and subsequent `inttoptr` don't make a legal pointer. LLVM can then assume the pointer never escaped, which helps alias analysis.
* It does not support arbitrary integer types, but only exactly the integer type that is equal in width to the pointer type as specified by the data layout.

This PR adds the op the MLIR dialect and adds the corresponding verification for the datalayout property.
DeltaFile
+18-0mlir/test/Dialect/LLVMIR/invalid.mlir
+15-0mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
+9-0mlir/test/Target/LLVMIR/llvmir.mlir
+8-0mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
+2-0mlir/test/Dialect/LLVMIR/roundtrip.mlir
+2-0mlir/test/Target/LLVMIR/Import/instructions.ll
+54-06 files

LLVM/project a99d4a6mlir/lib/Tools/mlir-reduce MlirReduceMain.cpp

[mlir][reducer] Add split-input-file to mlir-reduce (#184970)

The tests for mlir-reduce are currently scattered. To centralize the
tests for mlir-reduce, I added the split-input-file feature to
mlir-reduce.It is part of
https://github.com/llvm/llvm-project/pull/184974.
DeltaFile
+43-30mlir/lib/Tools/mlir-reduce/MlirReduceMain.cpp
+43-301 files

LLVM/project ae4e712llvm/lib/Target/WebAssembly/AsmParser WebAssemblyAsmParser.cpp, llvm/lib/Target/WebAssembly/MCTargetDesc WebAssemblyTargetStreamer.cpp

[MC][WebAssembly] Allow strings for import modules and names in asm (#182896)

Current tooling for the WebAssembly component model uses import modules
and names such as `$root` and `[thread-index]`. Importing these from
assembly files requires support for non-valid identifiers in
`.import_name` and `.import_module` directives. This PR adds support for
specifying those as strings, e.g.:

```asm
        .import_module __wasm_component_model_builtin_thread_index, "$root"
        .import_name __wasm_component_model_builtin_thread_index, "[thread-index]"
```
DeltaFile
+46-1llvm/test/MC/WebAssembly/export-name.s
+44-2llvm/test/MC/WebAssembly/import-module.s
+18-3llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
+4-6llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyTargetStreamer.cpp
+4-4llvm/test/CodeGen/WebAssembly/lower-em-ehsjlj-options.ll
+4-1llvm/test/MC/WebAssembly/export-name-invalid.s
+120-174 files not shown
+132-2310 files

LLVM/project eada0f5clang-tools-extra/clang-doc/assets head-template.mustache clang-doc-mustache.css, clang-tools-extra/test/clang-doc basic-project.mustache.test

[clang-doc] Add button toggle for light/dark theme (#181587)

The user can now manually toggle the light or dark theme instead of
waiting for the system theme to change.

Also fixes a typo that caused some overflow issues even when there was
no content to cause an overflow.
DeltaFile
+42-2clang-tools-extra/clang-doc/assets/head-template.mustache
+8-4clang-tools-extra/test/clang-doc/basic-project.mustache.test
+9-1clang-tools-extra/clang-doc/assets/clang-doc-mustache.css
+10-0clang-tools-extra/clang-doc/assets/navbar-template.mustache
+69-74 files

LLVM/project a71adf1llvm/lib/Target/PowerPC PPCISelLowering.cpp PPCAsmPrinter.cpp, llvm/test/CodeGen/PowerPC amo-enable.ll

Address review comments
DeltaFile
+3-11llvm/test/CodeGen/PowerPC/amo-enable.ll
+4-3llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+2-2llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
+9-163 files

LLVM/project 4d53c42compiler-rt/lib/builtins CMakeLists.txt

builtins: Make cmake formatting self-consistent aftr #183871

No behavior change.
DeltaFile
+4-2compiler-rt/lib/builtins/CMakeLists.txt
+4-21 files

LLVM/project 38459f3llvm/test/tools/llubi loadstore_le.ll loadstore_be.ll, llvm/tools/llubi/lib Context.cpp Interpreter.cpp

Revert "[llubi] Add support for load/store/lifetime markers (#182532)"

This reverts commit 0311bb623a1e1bd101e517cfde4538039f65aa24.
DeltaFile
+32-303llvm/tools/llubi/lib/Context.cpp
+0-192llvm/test/tools/llubi/loadstore_le.ll
+0-190llvm/test/tools/llubi/loadstore_be.ll
+8-127llvm/tools/llubi/lib/Interpreter.cpp
+17-46llvm/tools/llubi/lib/Value.h
+4-42llvm/tools/llubi/lib/Context.h
+61-9009 files not shown
+65-1,04815 files

LLVM/project 337fed3clang/lib/CodeGen CGExprAgg.cpp, clang/test/CodeGenHIP sret-nontrivial-copyable.hip

[Clang] Fix EmitAggregateCopy assertion for non-trivially-copyable sr… (#185091)

…et types

Fix for buildbot crash on #183639
The UseTemp path in AggExprEmitter::withReturnValueSlot copies back via
EmitAggregateCopy, which asserts that the type has a trivial copy/move
constructor or assignment operator. Gate the DestASMismatch condition on
isTriviallyCopyableType so that non-trivially-copyable types (e.g.
std::exception_ptr) fall through to the addrspacecast path instead.

Fix buildbot crash:
https://lab.llvm.org/buildbot/#/builders/73/builds/19803
DeltaFile
+34-0clang/test/CodeGenHIP/sret-nontrivial-copyable.hip
+7-6clang/lib/CodeGen/CGExprAgg.cpp
+2-2clang/test/OpenMP/amdgcn_sret_ctor.cpp
+43-83 files

LLVM/project bdec4dallvm/lib/Target/RISCV RISCVISelLowering.cpp RISCVInstrInfoP.td, llvm/test/CodeGen/RISCV rvp-ext-rv64.ll rvp-ext-rv32.ll

[RISCV][P-ext] Only support sshlsat for splat immediate shift amounts. (#184886)

Fixes cannot select errors for other types of shift amounts.

I've made a new RISCVISD node that only allows an immediate operand.
It's assumed that the lowering code will only allow valid immediates so
I'm not using a TImmLeaf in the match.
DeltaFile
+117-9llvm/test/CodeGen/RISCV/rvp-ext-rv64.ll
+90-6llvm/test/CodeGen/RISCV/rvp-ext-rv32.ll
+15-1llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+5-4llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+227-204 files

LLVM/project d6f2ea4llvm/docs/TableGen ProgRef.rst, llvm/lib/TableGen TGParser.cpp TGParser.h

[TableGen] Add let append/prepend syntax for field concatenation
DeltaFile
+224-0llvm/test/TableGen/let-append.td
+91-14llvm/lib/TableGen/TGParser.cpp
+63-0llvm/test/TableGen/let-append-toplevel.td
+45-2llvm/docs/TableGen/ProgRef.rst
+22-4llvm/lib/TableGen/TGParser.h
+12-0llvm/test/TableGen/let-prepend-error.td
+457-202 files not shown
+481-208 files

LLVM/project 5230955flang/include/flang/Optimizer/Builder IntrinsicCall.h, flang/lib/Optimizer/Builder IntrinsicCall.cpp

[flang,acc] Support -ffp-maxmin-behavior option in lowering. (#184730)

This patch adds `flang -fc1` option `-ffp-maxmin-behavior` and
propagates it throughout Flang, so that semantics context,
lowering and the pass pipeline builder can use it.

MAX/MIN intrinsic and OpenACC max/min reduction lowering
are now controlled by the option.

I kept the `Legacy` mode, which is the default and matches the current
behavior. I am going to test and merge a follow-up patch that
replaces `Legacy` with `Portable`.

RFC:
https://discourse.llvm.org/t/flang-canonical-and-optimizable-representation-for-min-max/90037
DeltaFile
+64-67flang/lib/Optimizer/Builder/IntrinsicCall.cpp
+114-0flang/test/Lower/OpenACC/acc-reduction-maxmin.f90
+63-9mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td
+52-0flang/test/Lower/fp-maxmin-behavior.f90
+46-0mlir/test/Dialect/OpenACC/ops.mlir
+1-44flang/include/flang/Optimizer/Builder/IntrinsicCall.h
+340-12024 files not shown
+605-15130 files

LLVM/project 4b072b7clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp, clang/test/CodeGen amdgpu-abi-version.c

clang/AMDGPU: Fix workgroup size builtins for nonuniform work group sizes

These were assuming uniform work group sizes. Emit the v4 and v5 sequences
to take the remainder group for the nonuniform case.

Currently the device libs uses this builtin on the legacy ABI path with
the same sequence to calculate the remainder, and fully implements the v5
path. If you perform a franken-build of the library with the updated builtin,
the result is worse. The duplicate sequence does not fully fold out. However,
it does not appear to be wrong. The relevant conformance tests still pass.
DeltaFile
+627-0clang/test/CodeGenOpenCL/builtins-amdgcn-workgroup-size.cl
+123-36clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+100-16clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+60-30clang/test/Headers/gpuintrin.c
+35-15clang/test/CodeGen/amdgpu-abi-version.c
+0-19clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+945-1166 files

LLVM/project a8783dclldb/test/Shell/ScriptInterpreter/Python bytecode.test

[lldb][bytecode] Disable bytecode.test on windows (#185096)

The test is failing on the lldb-x86_64-win buildbot.
DeltaFile
+1-0lldb/test/Shell/ScriptInterpreter/Python/bytecode.test
+1-01 files

LLVM/project df783c5llvm/docs/TableGen ProgRef.rst, llvm/lib/TableGen TGParser.cpp TGParser.h

[TableGen] Add let append/prepend syntax for field concatenation
DeltaFile
+224-0llvm/test/TableGen/let-append.td
+91-14llvm/lib/TableGen/TGParser.cpp
+63-0llvm/test/TableGen/let-append-toplevel.td
+45-2llvm/docs/TableGen/ProgRef.rst
+22-4llvm/lib/TableGen/TGParser.h
+13-0llvm/test/TableGen/let-append-error.td
+458-202 files not shown
+483-208 files

LLVM/project 0d71610clang/test/CodeGenObjC expose-direct-method-visibility-linkage.m expose-direct-method-linkedlist.m

add darwin back
DeltaFile
+0-48clang/test/CodeGenObjC/expose-direct-method-visibility-linkage.m
+1-0clang/test/CodeGenObjC/expose-direct-method-linkedlist.m
+1-482 files

LLVM/project ab10f08clang/lib/CIR/CodeGen CIRGenFunction.cpp CIRGenVTables.cpp, clang/test/CIR/CodeGen thunks.cpp

[CIR] Fix a crash when source location is unknown (#185059)

When we call `getLoc()` with an invalid `SourceLocation` and
`currSrcLoc` is also invalid, we were crashing or asserting. I tracked
down one case where this was happening (generating an argument in a
vtable thunk) and fixed that to provide a location. I also am updating
the `getLoc()` implementation so that it will use an unknown location in
release builds rather than crashing because the location isn't critical
for correct compilation.
DeltaFile
+64-0clang/test/CIR/CodeGen/thunks.cpp
+12-4clang/lib/CIR/CodeGen/CIRGenFunction.cpp
+2-2clang/lib/CIR/CodeGen/CIRGenVTables.cpp
+78-63 files

LLVM/project d32ffdeclang/docs ClangIRABILowering.md index.rst

[CIR] Add MLIR ABI Lowering design document

Design document for MLIR dialect-agnostic calling convention
lowering that builds on the LLVM ABI Lowering Library
(llvm/lib/ABI/) as the single source of truth for ABI
classification.  Dialects use the library via an adapter layer:
ABITypeMapper maps dialect types to abi::Type*, the library
classifies arguments and returns, and a dialect-specific
ABIRewriteContext applies the decisions back to IR operations.

Targets x86_64 and AArch64, with parity against Classic Clang
CodeGen validated through differential testing.
DeltaFile
+545-0clang/docs/ClangIRABILowering.md
+1-0clang/docs/index.rst
+546-02 files

LLVM/project 2cb01dcclang-tools-extra/clang-doc/benchmarks ClangDocBenchmark.cpp CMakeLists.txt

[clang-doc] Fix benchmark not compiling (#185065)

CI didn't flag that the benchmark was using the outdated Ctx call
when landing the Mustache MD patch since this benchmark isn't tested.
Also added missing libraries in CMake that prevented me from building
the benchmark locally.
DeltaFile
+2-2clang-tools-extra/clang-doc/benchmarks/ClangDocBenchmark.cpp
+2-0clang-tools-extra/clang-doc/benchmarks/CMakeLists.txt
+4-22 files

LLVM/project 918d0fellvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU asyncmark-pregfx12.ll asyncmark-waitcnt.mir

[AMDGPU] fix asyncmark soft waitcnt bug (#184851)

Asyncmarks record the current wait state and so should not allow waitcnts that occur after them to be merged into waitcnts that occur before.
DeltaFile
+111-8llvm/test/CodeGen/AMDGPU/asyncmark-pregfx12.ll
+25-0llvm/test/CodeGen/AMDGPU/asyncmark-waitcnt.mir
+11-7llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+147-153 files

LLVM/project 097122cclang/lib/CodeGen CGExprCXX.cpp, clang/test/CodeGenCXX microsoft-abi-diamond-template-multiple-vbptrs.cpp

  [clang][CodeGen] Fix size calculation in vbptr split memory region in EmitNullBaseClassInitialization (#184558)

When splitting memory stores around multiple virtual base pointers
(vbptrs)
in the Microsoft ABI, the calculation for the size of the memory region
after
  each vbptr was incorrect.

The bug/old calculation: SplitAfterSize = LastStoreSize -
SplitAfterOffset
  This subtracts an absolute offset from a relative size, causing
  incorrect (too small) sizes after the second vbptr.
  
  The correct size should be:
  SplitAfterSize = (LastStoreOffset + LastStoreSize) - SplitAfterOffset

Since all store regions extend to the end of the non-virtual portion
(NVSize),
  this patch uses the simplified form: 

    [3 lines not shown]
DeltaFile
+122-0clang/test/CodeGenCXX/microsoft-abi-diamond-template-multiple-vbptrs.cpp
+1-2clang/lib/CodeGen/CGExprCXX.cpp
+123-22 files