LLVM/project 72860cdclang/test/CIR/CodeGenBuiltins builtins-elementwise.c builtin-signbit.c

[CIR][NFC] Merge duplicate checks in CodeGenBuiltins tests (#196224)
DeltaFile
+1-112clang/test/CIR/CodeGenBuiltins/builtins-elementwise.c
+1-41clang/test/CIR/CodeGenBuiltins/builtin-signbit.c
+1-19clang/test/CIR/CodeGenBuiltins/builtin-object-size.cpp
+1-13clang/test/CIR/CodeGenBuiltins/builtins-pred-info.c
+1-7clang/test/CIR/CodeGenBuiltins/builtin-memchr.c
+1-6clang/test/CIR/CodeGenBuiltins/builtins.cpp
+6-1981 files not shown
+7-2017 files

LLVM/project 14c0953libcxx/docs/Status Cxx2cIssues.csv, libcxx/include/__mdspan mdspan.h

[libc++][mdspan] LWG3974: `mdspan::operator[]` should not copy `OtherIndexTypes` (#195814)

Fixes: #105308
DeltaFile
+22-1libcxx/test/std/containers/views/mdspan/mdspan/index_operator.pass.cpp
+2-2libcxx/include/__mdspan/mdspan.h
+1-1libcxx/docs/Status/Cxx2cIssues.csv
+25-43 files

LLVM/project 5c8fdd8libcxx/docs/Status Cxx2cIssues.csv

[libc++][NFC] Mark LWG3884 as complete in C++26 issues status (#195819)

[LWG3884](https://wg21.link/LWG3884) requires allocator-extended
copy/move constructors on the flat container adaptors. All four
container adaptors (flat_map, flat_multimap, flat_set, flat_multiset)
landed with these constructors and their tests already in place:

- flat_map      (#98643)  -- LLVM 20
- flat_multimap (#113835) -- LLVM 20
- flat_set      (#125241) -- LLVM 21
- flat_multiset (#128363) -- LLVM 21

This LWG issue was fully addressed once flat_set/flat_multiset landed in
LLVM 21, so the status is updated to `|Complete|` with first released
version 21.

Closes #105269

Co-authored-by: Hristo Hristov <zingam at outlook.com>
DeltaFile
+1-1libcxx/docs/Status/Cxx2cIssues.csv
+1-11 files

LLVM/project 1d8a06bllvm/lib/Target/AMDGPU SIInstructions.td AMDGPUInstructions.td

[AMDGPU] Use multiclass for v2i32 and i64 bfi
DeltaFile
+55-81llvm/lib/Target/AMDGPU/SIInstructions.td
+1-1llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
+56-822 files

LLVM/project 1db47f8libc/src/stdlib CMakeLists.txt

[libc][NFC] Workaround for environ_internal when building with gcc-12. (#196471)
DeltaFile
+10-0libc/src/stdlib/CMakeLists.txt
+10-01 files

LLVM/project 511286dllvm/test/Instrumentation/Instrumentor generate_rt.ll

[Instrumentor][test] Ensure dir is writeable (#196466)

When running the test in a runner where the source directory is read
only, this test fails w/ `error: failed to open instrumentor stub
runtime file for writing: Permission denied`. Run the test in a
writeable test dir `%t` to ensure we can actually write to the current
directory.
DeltaFile
+1-0llvm/test/Instrumentation/Instrumentor/generate_rt.ll
+1-01 files

LLVM/project 1c1461allvm/lib/Target/AMDGPU AMDGPUAttributor.cpp

[NFC][AMDGPU] Use a worklist and remember results in AMDGPUAttributor (#196452)

This was a recursive function with a Map to cache things that was never filled.
Now it's a worklist and the map is actually used.

Co-authored-by: Johannes Doerfert <johannes at jdoerfert.de>
DeltaFile
+38-19llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+38-191 files

LLVM/project 2751c7eclang/lib/AST StmtProfile.cpp, clang/test/Modules polluted-operator.cppm

Reland [C++20] [Modules] Don't profiling the callee of CXXFoldExpr (#190732) (#195983)

Close https://github.com/llvm/llvm-project/issues/190333

For the test case, the root cause of the problem is, the compiler
thought the declaration of `operator &&` in consumer.cpp may change the
meaning of '&&' in the requrie clause of `F::operator()`. But it doesn't
make sense. Here we skip profiling the callee to solve the problem. Note
that we've already record the kind of the operator. So '&&' and '||'
won't be confused.

---

See the discussion in https://github.com/llvm/llvm-project/pull/194283

For the new found pattern that we may have other binary operator (e.g.,
operator +) in the require clause, e.g.,

```C++

    [5 lines not shown]
DeltaFile
+0-79clang/test/Modules/polluted-operator.cppm
+20-1clang/lib/AST/StmtProfile.cpp
+6-0clang/test/SemaCXX/GH190333.cpp
+26-803 files

LLVM/project 42f99d6llvm/include/llvm/Transforms/IPO Instrumentor.h, llvm/lib/Transforms/IPO Instrumentor.cpp

[Instrumentor] Add a global function regexp to limit the instrumentation

Only functions that match the "function_regex" will be instrumented,
or if they have the instrumentation attribute.
DeltaFile
+57-0llvm/test/Instrumentation/Instrumentor/function_regex.ll
+26-0llvm/test/Instrumentation/Instrumentor/function_regex.json
+20-2llvm/lib/Transforms/IPO/Instrumentor.cpp
+7-1llvm/include/llvm/Transforms/IPO/Instrumentor.h
+3-1llvm/test/Instrumentation/Instrumentor/default_config.json
+113-45 files

LLVM/project fd83d89clang/docs ReleaseNotes.rst, clang/include/clang/Basic DiagnosticSemaKinds.td

[clang][diagnostics] Reject embedded NUL characters in inline asm constraints and clobbers
DeltaFile
+18-0clang/lib/Sema/SemaStmtAsm.cpp
+16-0clang/test/Sema/inline-asm-constraint-embedded-null.c
+0-8clang/test/CodeGen/inline-asm-constraint-embedded-null.c
+3-0clang/docs/ReleaseNotes.rst
+3-0clang/include/clang/Basic/DiagnosticSemaKinds.td
+40-85 files

LLVM/project fbbd98dclang/test/Instrumentor UnreachableRT.cpp InstrumentorUnreachable.cpp, llvm/include/llvm/Transforms/IPO Instrumentor.h

[Instrumentor] Add unreachable support; unreachable stack trace printing

Allow to instrument unreachable and provide a use case for stack trace
printing.
DeltaFile
+21-0clang/test/Instrumentor/UnreachableRT.cpp
+21-0llvm/include/llvm/Transforms/IPO/Instrumentor.h
+20-0clang/test/Instrumentor/InstrumentorUnreachable.cpp
+15-0clang/test/Instrumentor/UnreachableRT.json
+12-0llvm/lib/Transforms/IPO/Instrumentor.cpp
+5-1clang/test/Instrumentor/lit.local.cfg
+94-11 files not shown
+99-17 files

LLVM/project f9a0b5dllvm/lib/Target/LoongArch LoongArchISelLowering.cpp LoongArchISelLowering.h, llvm/test/CodeGen/LoongArch/lsx vec-sext.ll vmskcond.ll

[LoongArch] Custom lowering for LSX vector sign extensions (#194325)

Custom LSX sign-extensions to combinations of `SLTI` + `VILVL` + `VILVH`
if possible.

For example,  we could lower vector sext to following instructions:
```
%B = sext <4 x i16> %A to <4 x i32>
vslti.h v2, v1, 0
vilvl.h v1, v2, v1 

%B = sext <4 x i32> %A to <4 x i64>
vslti.w v3, v1, 0
vilvh.w v2, v3, v1
vilvl.w v1, v3, v1
```
When these combinations is worse than convert sext to shuffle, we simply
use the latter one instead.
DeltaFile
+76-109llvm/test/CodeGen/LoongArch/lsx/vec-sext.ll
+107-0llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+2-8llvm/test/CodeGen/LoongArch/lsx/vmskcond.ll
+2-0llvm/lib/Target/LoongArch/LoongArchISelLowering.h
+187-1174 files

LLVM/project 5c127fbflang/lib/Semantics expression.cpp, flang/lib/Support Fortran.cpp

[flang][cuda][openacc] Reject UseDevice actual against managed/unified dummy (#196428)

After #195182 introduced the `UseDevice` attribute, a `use_device(...)`
actual was treated as compatible with **any** dummy attribute. Combined
with the matching distance returning ∞ for `UseDevice →
managed/unified`, this caused generic resolution to misreport a clean
"no match" as an **ambiguity** when only managed/unified specifics
existed.

This PR tightens `AreCompatibleCUDADataAttrs`: a `UseDevice` actual is
only compatible with a `Device` dummy or a host (no-attribute) dummy.
Other attributes (`Managed`, `Unified`, `Pinned`, ...) require their
actual to live in that specific kind of memory.
DeltaFile
+31-0flang/test/Semantics/cuf27.cuf
+2-9flang/lib/Semantics/expression.cpp
+4-2flang/lib/Support/Fortran.cpp
+37-113 files

LLVM/project d335ccelld/wasm SymbolTable.cpp

[lld][WebAssembly] Improve formatting consistency. NFC (#196458)
DeltaFile
+4-6lld/wasm/SymbolTable.cpp
+4-61 files

LLVM/project a34877ellvm/lib/Transforms/Instrumentation MemorySanitizer.cpp, llvm/test/Instrumentation/MemorySanitizer ftrunc.ll

[msan] Handle fpto[us]i_sat (#196429)

This adds explicit handling for fpto[us]i_sat, similar to how the
non-saturating versions are handled.

N.B. PR #191365 lowered NEON fcvtz[us] intrinsics into fpto[us]i.sat.
There is a slight inconsistency in MSan insofar as fcvtz[us] were
handled by handleNEONVectorConvertIntrinsic(), which takes an
all-or-nothing propagation approach to the shadows (i.e., even a single
uninitialized bit will result in the corresponding integer being fully
uninitialized), while fpto[us]i were handled by propagating the shadow
unchanged. For now, we choose to have fpto[us]i_sat follow the laxer
behavior of fpto[usi. Future work may consider changing the behavior of
fpto[us]i and fpto[us]i_sat to use the all-or-nothing approach.
DeltaFile
+7-19llvm/test/Instrumentation/MemorySanitizer/ftrunc.ll
+9-0llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
+16-192 files

LLVM/project cc35d56clang/lib/CIR/CodeGen CIRGenBuiltinRISCV.cpp, clang/test/CIR/CodeGenBuiltins/RISCV riscv-zksed.c

[CIR][RISCV] Support zksed builtin codegen (#196250)
DeltaFile
+26-0clang/test/CIR/CodeGenBuiltins/RISCV/riscv-zksed.c
+8-2clang/lib/CIR/CodeGen/CIRGenBuiltinRISCV.cpp
+34-22 files

LLVM/project 71726e0llvm/runtimes CMakeLists.txt

Revert "[Runtimes] Pass through per-runtime CMake options for target runtimes" (#196236)

Reverts llvm/llvm-project#194105
DeltaFile
+0-3llvm/runtimes/CMakeLists.txt
+0-31 files

LLVM/project a262156llvm/lib/Target/AMDGPU SIInstrInfo.cpp, llvm/test/MachineVerifier/AMDGPU lit64.mir

[AMDGPU] Add lit64 machine verifier
DeltaFile
+13-4llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+9-0llvm/test/MachineVerifier/AMDGPU/lit64.mir
+22-42 files

LLVM/project 32501b3llvm/lib/Target/AMDGPU/AsmParser AMDGPUAsmParser.cpp, llvm/test/MC/AMDGPU gfx1250_asm_vop2_err.s

[AMDGPU] Only src0 and mandatory literals can use literal64
DeltaFile
+15-0llvm/test/MC/AMDGPU/gfx1250_asm_vop2_err.s
+8-0llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+23-02 files

LLVM/project 8fdf195mlir/lib/Reducer ReductionTreePass.cpp, mlir/test/mlir-reduce/reduction-tree trivially-dead.mlir

[mlir][reducer] Change mlir-reducer apply pattern logic (#195997)

This PR aligns the pattern application logic with the operation deletion
strategy, It indirectly achieves the separation of operation deletion
and pattern application. It also fixes a bug where trivially dead ops
within `opsInRange` was being incorrectly deleted when apply patterns.
While `opsNotInRange` grows from zero (via binary search), `opsInRange
`shrinks from the entire module down to zero. This fixes a crash where
patterns were initially applied to the whole module. If the module in
the current iteration is 'uninteresting', it gets erased. Consequently,
when the iterator increments, it fails to clone the parent iteration's
module, leading to a crash.
DeltaFile
+9-11mlir/lib/Reducer/ReductionTreePass.cpp
+15-0mlir/test/mlir-reduce/reduction-tree/trivially-dead.mlir
+4-0mlir/test/mlir-reduce/script/trivially-dead.sh
+28-113 files

LLVM/project 9a4824cclang/test/Instrumentor StackUsageRT.cpp StackUsageRT.json, llvm/include/llvm/Transforms/IPO Instrumentor.h

[Instrumentor] Add Alloca and Function support; stack usage example

This adds support for alloca instrumentation and function pre/post
instrumentation. Alloca support follows load/store support directly.
Functions require special care to determine the insertion points.

Together, we can showcase how the stack high watermark can be profiled,
see InstrumentorStackUsage.cpp.
DeltaFile
+296-7llvm/lib/Transforms/IPO/Instrumentor.cpp
+118-8llvm/include/llvm/Transforms/IPO/Instrumentor.h
+59-0llvm/test/Instrumentation/Instrumentor/default_config.json
+59-0clang/test/Instrumentor/StackUsageRT.cpp
+56-0llvm/test/Instrumentation/Instrumentor/alloca_and_function.ll
+54-0clang/test/Instrumentor/StackUsageRT.json
+642-152 files not shown
+681-158 files

LLVM/project bc58071llvm/test/CodeGen/AArch64 bf16-v8-instructions.ll bf16-v4-instructions.ll, llvm/test/CodeGen/AMDGPU amdgpu-simplify-libcall-pow.ll arbitrary-fp-to-float.ll

keep comment

Created using spr 1.3.4
DeltaFile
+5,910-880llvm/test/CodeGen/AArch64/bf16-v8-instructions.ll
+3,306-504llvm/test/CodeGen/AArch64/bf16-v4-instructions.ll
+0-775llvm/utils/Reviewing/find_interesting_reviews.py
+665-0llvm/test/CodeGen/NVPTX/arbitrary-fp-to-float.ll
+329-329llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow.ll
+595-8llvm/test/CodeGen/AMDGPU/arbitrary-fp-to-float.ll
+10,805-2,4961,170 files not shown
+31,838-13,7701,176 files

LLVM/project a0aed33clang/lib/Format BreakableToken.cpp, clang/unittests/Format AlignmentTest.cpp

[clang-format] Align stuff containing multi-line comment (#195398)

Fixes #194717.

Previously the information about the comment's scope could get lost.
Then the program would not align it.

new

```C++
foo          fooNode(ConvertStdStringToUString(fieldNames[chIdx]),
                     // asdf
                     // foo1 foo2 foo12345
                     SomeFunctionAB(a123456789012345));
const size_t v1234567890123456789012345678901234;
```

old


    [6 lines not shown]
DeltaFile
+7-0clang/unittests/Format/AlignmentTest.cpp
+1-1clang/lib/Format/BreakableToken.cpp
+8-12 files

LLVM/project 311290cclang/include/clang/Analysis/Analyses/LifetimeSafety Origins.h, clang/lib/Analysis/LifetimeSafety Origins.cpp FactsGenerator.cpp

[LifetimeSafety] Track per-field origins for record types
DeltaFile
+237-4clang/test/Sema/warn-lifetime-safety.cpp
+95-8clang/lib/Analysis/LifetimeSafety/Origins.cpp
+67-24clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
+47-13clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+21-12clang/lib/Analysis/LifetimeSafety/LiveOrigins.cpp
+4-6clang/test/Sema/warn-lifetime-safety-dangling-field.cpp
+471-672 files not shown
+476-678 files

LLVM/project 4a4a0ballvm/lib/Target/AMDGPU/AsmParser AMDGPUAsmParser.cpp, llvm/test/MC/AMDGPU gfx1250_asm_vop3_err.s

[AMDGPU] Also disable lit64() from VOP3 and inline constant (#196421)
DeltaFile
+5-3llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+5-0llvm/test/MC/AMDGPU/gfx1250_asm_vop3_err.s
+10-32 files

LLVM/project 23a13d0llvm/tools CMakeLists.txt

[CMake][NFC] Remove dead code add_llvm_external_project(libclc) (#196241)

It was added in 72f9881c3ffcf. libclc has now switched to runtime build.
DeltaFile
+0-3llvm/tools/CMakeLists.txt
+0-31 files

LLVM/project fb20976mlir/lib/Dialect/XeGPU/Transforms XeGPULayoutImpl.cpp XeGPUBlocking.cpp, mlir/test/Dialect/XeGPU propagate-layout-inst-data.mlir

[MLIR][XeGPU] Fix layout inference issues blocking MXFP_GEMM test  (#196243)

This branch fixes layout inference issues in XeGPU passes that were
blocking MXFP (microscaled floating point) GEMM workloads:
                                                        
- Fix bitcast layout adjustment to use result shape instead of source
shape. The setupBitCastResultLayout function were incorrectly bounding
the layout adjustment loop against the source shape. Added tests.
- Fix blocking pass to drop inst_data from anchor operations. Operations
whose shape already matches inst_data don't get unrolled, so their
layout attributes retained stale inst_data that broke downstream passes.
Now inst_data is unconditionally stripped from all op attributes after
blocking.
- Propagate layout to both results of vector.deinterleave. The layout
recovery pass was only setting the layout on result 0, leaving result 1
without a layout.
                  
  Test plan                                             
   

    [9 lines not shown]
DeltaFile
+73-0mlir/test/Integration/Dialect/XeGPU/WG/simple_mxfp_gemm.mlir
+46-0mlir/test/Dialect/XeGPU/propagate-layout-inst-data.mlir
+11-5mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutImpl.cpp
+6-0mlir/lib/Dialect/XeGPU/Transforms/XeGPUBlocking.cpp
+136-54 files

LLVM/project 0ed2eb4llvm/lib/Target/AMDGPU AMDGPUAttributor.cpp

[NFC][AMDGPU] Use a worklist and remember results in AMDGPUAttributor

This was a recursive function with a Map to cache things that was never filled.
Now it's a worklist and the map is actually used.

Co-authored-by: Johannes Doerfert <johannes at jdoerfert.de>
DeltaFile
+38-19llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+38-191 files

LLVM/project 589faedllvm/lib/Target/RISCV RISCVFrameLowering.cpp, llvm/test/CodeGen/RISCV stack-probing-dynamic-nonentry.ll

[CodeGen][RISCV] Inline stack probes immediately after `allocateStack` in `eliminateCallFramePseudoInstr` (#195456)

This PR adds a call to `inlineStackProbe` immediately after
`allocateStack` in `eliminateCallFramePseudoInstr`. This allows code
generation for stack probe pseudoinstructions in non-entry BBs.

Fixes #195454.
DeltaFile
+115-0llvm/test/CodeGen/RISCV/stack-probing-dynamic-nonentry.ll
+1-0llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
+116-02 files

LLVM/project 8b8fdfdllvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 expanded-operand-already-scheduled.ll expanded-binop-doesnotneedschedule-user.ll

[SLP]Bail out on non-schedulable expanded binop with stale operand deps

In tryScheduleBundle's DoesNotRequireScheduling path, an expanded binop
(shl X, 1 modeled as add X, X) doubles the dependency count of the
duplicated operand. If the operand has a
single IR use yet its ScheduleData already has Dependencies populated
by an earlier calculation that did not see the expanded duplicate use,
double decrement still exceeds calculateDependencies' single increment
and UnscheduledDeps goes negative.

Fixes #196281.

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/196449
DeltaFile
+50-0llvm/test/Transforms/SLPVectorizer/X86/expanded-operand-already-scheduled.ll
+11-5llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+3-3llvm/test/Transforms/SLPVectorizer/X86/expanded-binop-doesnotneedschedule-user.ll
+64-83 files