LLVM/project 1ff1e5fllvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass.ll

InstCombine: Stop applying nofpclass from use nofpclass attribute (#183835)

Functionally reverts a80d4329ce96856a02bd279c800c3d08619da4c9, with new
test.
This should be applied somewhere, but this is the wrong place.

Fixes regression reported after #182444
DeltaFile
+21-2llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+0-5llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+21-72 files

LLVM/project 32eb450llvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir

Update coexec-sched-effective-stall.mir
DeltaFile
+0-2llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+0-21 files

LLVM/project 702e4eclldb/test/API/macosx/delay-init-dependency TestDelayInitDependency.py

[lldb/test] Skip TestDelayInitDependency on remote platforms (#183885)

This test exercises macOS-specific linker functionality (-delay_library)
and uses a hardcoded local working directory for the launch info. It
should not run against a remote platform where neither condition holds.

Signed-off-by: Med Ismail Bennani <ismail at bennani.ma>
DeltaFile
+1-0lldb/test/API/macosx/delay-init-dependency/TestDelayInitDependency.py
+1-01 files

LLVM/project 1eb9bd8clang/lib/AST DeclTemplate.cpp, clang/test/SemaTemplate GH181062.cpp

[clang] Backport: fix transformation of substituted constant template parameters of partial specializations

This fixes a helper so it implements retrieval of the argument replaced
for a template parameter for partial spcializations.

This was left out of the original patch, since it's quite hard to actually test.

This helper implements the retrieval for variable templates, but only for
completeness sake, as no current users rely on this, and I don't think a similar
test case is possible to implement with variable templates.

This fixes a regression introduced in #161029 which will be backported to llvm-22,
so there are no release notes.

Backport from #183348

Fixes #181062
Fixes #181410
DeltaFile
+24-0clang/test/SemaTemplate/GH181062.cpp
+10-10clang/lib/AST/DeclTemplate.cpp
+34-102 files

LLVM/project f977b46llvm/lib/Target/AMDGPU GCNSchedStrategy.cpp AMDGPUCoExecSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir

[AMDGPU] Add structural stall heuristic to scheduling strategies

Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.

- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
  - Resource conflicts on unbuffered resources (from the SchedModel)
  - Sequence-dependent hazards (from GCNHazardRecognizer)

- Add getHazardWaitStates() to GCNHazardRecognizer that returns the number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
DeltaFile
+35-0llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+26-3llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+7-2llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+6-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
+4-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+2-2llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+80-71 files not shown
+82-77 files

LLVM/project d09cd35llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp AMDGPUTargetMachine.cpp, llvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir amdgpu-workload-type-scheduler-debug.mir

[AMDGPU] Add ML-oriented coexec scheduler selection and queue handling

This patch adds the initial coexec scheduler scaffold for machine
learning workloads on gfx1250.

It introduces function and module-level controls for selecting the
AMDGPU preRA and postRA schedulers, including an `amdgpu-workload-type`
module flag that maps ML workloads to coexec preRA scheduling and a nop
postRA scheduler by default.

It also updates the coexec scheduler to use a simplified top-down
candidate selection path that considers both available and pending
queues through a single flow, setting up follow-on heuristic work.
DeltaFile
+275-0llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+124-0llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+114-0llvm/test/CodeGen/AMDGPU/amdgpu-workload-type-scheduler-debug.mir
+64-5llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+43-0llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+22-0llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+642-53 files not shown
+663-149 files

LLVM/project 3b30dcdclang/include/clang/Basic CodeGenOptions.h, clang/include/clang/Options Options.td

[Driver] Add -Wa,--reloc-section-sym= to control section symbol conversion (#183472)

Wire the llvm-mc --reloc-section-sym={all,internal,none} option through
the clang driver (-Wa,--reloc-section-sym=) and cc1as
(--reloc-section-sym=). The option is only valid for ELF targets.

GNU Assembler will add the option as well.
DeltaFile
+27-0clang/test/Misc/cc1as-reloc-section-sym.s
+16-0clang/lib/Driver/ToolChains/Clang.cpp
+9-0clang/test/Driver/reloc-section-sym.c
+9-0clang/tools/driver/cc1as_main.cpp
+6-0clang/include/clang/Options/Options.td
+2-0clang/include/clang/Basic/CodeGenOptions.h
+69-02 files not shown
+72-08 files

LLVM/project 27d654cllvm/lib/Target/AMDGPU AMDGPULowerVGPREncoding.cpp, llvm/test/CodeGen/AMDGPU vgpr-lowering-gfx1250.mir

[AMDGPU] Fix piggybacking after commute in AMDGPULowerVGPREncoding (#183778)

After successfully commuting an instruction to be compatible with the
current VGPR MSB mode, update CurrentMode with the commuted
instruction's mode requirements. This locks in the mode bits the
commuted instruction relies on, preventing later instructions from
piggybacking and corrupting those bits.

Without this fix, a subsequent instruction needing a different mode
could piggyback onto the preceding s_set_vgpr_msb and change mode bits
that the commuted instruction depends on. For example, a nullopt src1
position (treated as 0) could be overwritten to a different value,
causing incorrect register encoding for the commuted instruction.

The fix still allows compatible piggybacking - instructions that only
add new mode bits without changing existing ones can still piggyback.
DeltaFile
+47-1llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250.mir
+9-2llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
+56-32 files

LLVM/project bed8997lld/test/ELF aarch64-reloc-gotpcrel32.s, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

AArch64: Replace @plt/%gotpcrel in data directives with %pltpcrel %gotpcrel (#155776)

Similar to #132569 for RISC-V, replace the unofficial `@plt` and
`@gotpcrel` relocation specifiers, currently only used by clang
-fexperimental-relative-c++-abi-vtables, with %pltpcrel %gotpcrel. The
syntax is not used in humand-written assembly code, and is not supported
by GNU assembler.

Also replace the recent `@funcinit` with `%funcinit(x)`.
DeltaFile
+40-32llvm/test/MC/AArch64/data-directive-specifier.s
+24-8llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+22-5llvm/lib/Target/AArch64/MCTargetDesc/AArch64ELFObjectWriter.cpp
+18-1llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCAsmInfo.cpp
+9-9llvm/test/CodeGen/AArch64/ptrauth-irelative.ll
+5-5lld/test/ELF/aarch64-reloc-gotpcrel32.s
+118-6011 files not shown
+137-7617 files

LLVM/project ce6a3d9clang-tools-extra/clang-tidy/misc UnusedUsingDeclsCheck.cpp, clang-tools-extra/docs ReleaseNotes.rst

[clang-tidy] Teach `misc-unused-using-decls` that exported using-decls aren't unused (#183638)

Fixes #162619.
DeltaFile
+69-0clang-tools-extra/test/clang-tidy/checkers/misc/unused-using-decls-module.cpp
+6-0clang-tools-extra/clang-tidy/misc/UnusedUsingDeclsCheck.cpp
+4-0clang-tools-extra/docs/ReleaseNotes.rst
+79-03 files

LLVM/project 620a754bolt/test/AArch64 skip-non-vfuncptr-reloc-in-relative-vtable.s

update bolt test

Created using spr 1.3.5-bogner
DeltaFile
+1-1bolt/test/AArch64/skip-non-vfuncptr-reloc-in-relative-vtable.s
+1-11 files

LLVM/project fe76e90llvm/lib/CodeGen MachineBlockPlacement.cpp, llvm/test/CodeGen/X86 code_placement_ext_tsp_size_and_perf.ll

[CodeGen] Allow `-enable-ext-tsp-block-placement` and `-apply-ext-tsp-for-size`  passed together (#183642)

Currently, the asserts fires when both `UseExtTspForPerf` and
`UseExtTspForSize` are true on a given function.

Ideally, we should allow `-enable-ext-tsp-block-placement` and
`-apply-ext-tsp-for-size` passed together, meaning run the block
placement for performance on hot functions, while run the placement for
size on cold functions.

The diff makes `UseExtTspForPerf` and `UseExtTspForSize` mutually
exclusive per-function: functions with the `OptForSize` attribute use
ext-tsp block placement for size, while the others use ext-tsp block
placement for perf.

Co-authored-by: Sharon Xu <sharonxu at fb.com>
DeltaFile
+91-0llvm/test/CodeGen/X86/code_placement_ext_tsp_size_and_perf.ll
+3-3llvm/lib/CodeGen/MachineBlockPlacement.cpp
+94-32 files

LLVM/project d72e95bclang/test/CIR/CodeGenHLSL matrix-element-expr-load.hlsl

[CIR] Use `-verify` on clang/test/CIR/CodeGenHLSL/matrix-element-expr-load.hlsl (#182817)

Update clang/test/CIR/CodeGenHLSL/matrix-element-expr-load.hlsl to use
`-verify` with expected CIR NYI diagnostics.
DeltaFile
+7-6clang/test/CIR/CodeGenHLSL/matrix-element-expr-load.hlsl
+7-61 files

LLVM/project 0b88ee1clang/include/clang/CIR/Dialect/IR CIRAttrs.td CIRTypes.h, clang/lib/CIR/Dialect/IR CIRTypes.cpp CIRAttrs.cpp

[CIR] Infrastructure and MemorySpaceAttrInterface for Address Spaces (#179073)

Related: https://github.com/llvm/llvm-project/issues/175871,
https://github.com/issues/assigned?issue=llvm%7Cllvm-project%7C179278,
https://github.com/issues/assigned?issue=llvm%7Cllvm-project%7C160386

- Introducing the LangAddressSpace enum with offload address space kinds
(offload_private, offload_local, offload_global, offload_constant,
offload_generic) and the LangAddressSpaceAttr attribute.


- Generalizes CIR AS attributes as MemorySpaceAttrInterface and Attaches
it to `PointerType`. Includes test coverage for valid IR roundtrips and
invalid address space parsing.

This starts a series of patches with the purpose of bringing complete
address spaces support features for CIR. Most of the test coverage is
provided in subsequent patches further down the stack. note that most of
these patches are based on: https://github.com/llvm/clangir/pull/1986
DeltaFile
+163-39clang/lib/CIR/Dialect/IR/CIRTypes.cpp
+104-4clang/lib/CIR/Dialect/IR/CIRAttrs.cpp
+52-3clang/include/clang/CIR/Dialect/IR/CIRAttrs.td
+41-0clang/test/CIR/IR/address-space.cir
+29-3clang/test/CIR/IR/invalid-addrspace.cir
+17-3clang/include/clang/CIR/Dialect/IR/CIRTypes.h
+406-5212 files not shown
+474-7918 files

LLVM/project 6f9c68dllvm/test/Transforms/LoopVectorize/AArch64 scalable-strict-fadd.ll sve-interleaved-masked-accesses.ll

[VPlan] Don't adjust trip count for DataAndControlFlowWithoutRuntimeCheck (#183729)

Previously, the canonical IV increment may have overflowed to a non-zero
value due to vscale being a non power-of-two. So we used to emit a
runtime check for this.

If you didn't want the runtime check,
DataAndControlFlowWithoutRuntimeCheck skipped it and instead tweaked the
trip count so it wouldn't overflow.

However #144963 stopped the check from ever being emitted because vscale
is always a power-of-two on AArch64 and RISC-V, so it never overflowed
to a non-zero value. And in #183292 the code to emit the check was
removed. But we never restored the trip count back to normal when the
target's vscale was a power-of-two.

Now that vscale is always a power-of-two, this PR avoids adjusting it. A
follow up NFC can then remove DataAndControlFlowWithoutRuntimeCheck.
DeltaFile
+174-195llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
+78-90llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll
+61-66llvm/test/Transforms/LoopVectorize/AArch64/uniform-args-call-variants.ll
+13-43llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll
+40-2llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt.ll
+14-20llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll
+380-41622 files not shown
+438-63728 files

LLVM/project 6f27060clang/lib/CIR/Dialect/IR CIRTypes.cpp

fix fmt
DeltaFile
+4-4clang/lib/CIR/Dialect/IR/CIRTypes.cpp
+4-41 files

LLVM/project 5f22decclang/docs LanguageExtensions.rst ReleaseNotes.rst, clang/lib/Sema SemaChecking.cpp

Clang: Deprecate float support from __builtin_elementwise_max (#180885)

Now we have
  __builtin_elementwise_maxnum
  __builtin_elementwise_maximum
  __builtin_elementwise_maximumnum
DeltaFile
+17-0clang/test/Sema/builtins-elementwise-math.c
+13-1clang/lib/Sema/SemaChecking.cpp
+6-2clang/test/SemaCXX/builtins-elementwise-math.cpp
+4-4clang/docs/LanguageExtensions.rst
+3-0clang/docs/ReleaseNotes.rst
+1-1libclc/clc/lib/generic/math/clc_fdim.inc
+44-81 files not shown
+46-87 files

LLVM/project cc9f25dclang/include/clang/CIR/Dialect/IR CIRTypes.td CIRTypes.h, clang/lib/CIR/Dialect/IR CIRTypes.cpp

rename normalize AS function
DeltaFile
+3-3clang/lib/CIR/Dialect/IR/CIRTypes.cpp
+2-2clang/include/clang/CIR/Dialect/IR/CIRTypes.td
+1-1clang/include/clang/CIR/Dialect/IR/CIRTypes.h
+6-63 files

LLVM/project 62cfe16libc/src/__support/math acospif.h asinpif.h, libc/test/src/math acospif_test.cpp

[libc][math][c23] implement C23 `acospif` math function (#183661)

Implementing C23 `acospi` math function for single-precision with the
header-only approach that is followed since #147386
DeltaFile
+100-0libc/src/__support/math/acospif.h
+51-0libc/test/src/math/smoke/acospif_test.cpp
+8-41libc/src/__support/math/asinpif.h
+37-0libc/src/__support/math/inv_trigf_utils.h
+33-0libc/test/src/math/exhaustive/acospif_test.cpp
+29-0libc/test/src/math/acospif_test.cpp
+258-4125 files not shown
+429-6131 files

LLVM/project fb6b470libc/src/__support/math CMakeLists.txt floorf16.h, utils/bazel/llvm-project-overlay/libc BUILD.bazel

[libc][math] Refactor floor family to header-only (#182194)

Refactors the floor math family to be header-only.

Closes https://github.com/llvm/llvm-project/issues/182193

Target Functions:
  - floor
  - floorbf16
  - floorf
  - floorf128
  - floorf16
  - floorl
DeltaFile
+87-5utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+66-0libc/src/__support/math/CMakeLists.txt
+38-0libc/src/__support/math/floorf16.h
+29-0libc/src/__support/math/floor.h
+29-0libc/src/__support/math/floorf.h
+29-0libc/src/__support/math/floorf128.h
+278-518 files not shown
+505-7324 files

LLVM/project 8bd8d8ellvm/test/CodeGen/AMDGPU load-saddr-offset-imm.ll

[AMDGPU] Remove extra pipes from load-saddr-offset-imm.ll (#183874)

This test uses opt to run instcombin and then pipes that into llc which
has its output piped into FileCheck. Before this patch, the test also
piped in the source file into llc as well, which caused issues with a
downstream test executor that executes the lines in bash. However, these
extra pipes don't make sense anyways, so remove them.
DeltaFile
+4-4llvm/test/CodeGen/AMDGPU/load-saddr-offset-imm.ll
+4-41 files

LLVM/project c1f47d1llvm/test/CodeGen/AMDGPU llvm.exp10.f64.ll llvm.exp.f64.ll, llvm/test/CodeGen/X86 funnel-shift-i512.ll zero_extend_vector_inreg.ll

Rebase

Created using spr 1.3.7
DeltaFile
+11,178-0llvm/test/CodeGen/AMDGPU/llvm.exp10.f64.ll
+10,242-0llvm/test/CodeGen/AMDGPU/llvm.exp.f64.ll
+9,987-0llvm/test/CodeGen/AMDGPU/llvm.exp2.f64.ll
+5,445-0llvm/test/CodeGen/X86/funnel-shift-i512.ll
+1,389-1,365llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+1,094-1,106llvm/test/CodeGen/X86/zero_extend_vector_inreg.ll
+39,335-2,4711,401 files not shown
+96,876-32,1751,407 files

LLVM/project 5395d26llvm/lib/Target/WebAssembly WebAssemblyFixIrreducibleControlFlow.cpp

Revert "[WebAssembly] Incorporate SCCs into WebAssemblyFixIrreducibleControlFlow (#181755)" (#183872)

This reverts commit c05e323be7caaadff6fdd09a2336be60e3041af1.

Changes failed Emscripten tests.
DeltaFile
+135-150llvm/lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
+135-1501 files

LLVM/project 567d035clang/lib/Sema SemaTemplateDeduction.cpp

[clang] NFC: remove unused / untested workaround in pack deduction

This snippet was part of what was introduced in 130cc445e46836b28defdce03b1adfdb16ddcf41

However, none of the existing tests require it, including the tests added in
that commit.

One of those tests had a FIXME which was fixed when we switched
frelaxed-template-template-args on by default as well.
DeltaFile
+0-10clang/lib/Sema/SemaTemplateDeduction.cpp
+0-101 files

LLVM/project 1eb0496clang/include/clang/Basic DiagnosticSemaKinds.td, clang/lib/Sema SemaDeclAttr.cpp

[CUDA] Allow `extern __shared__` on non-array types

NVCC allows `extern __shared__` on any type, not just incomplete arrays.
This is commonly used in CUDA libraries like NCCL to overlay a struct on
dynamically-allocated shared memory:

    extern __shared__ ncclShmemData ncclShmem;

Previously, Clang rejected this with a hard error and did not add
`CUDASharedAttr` to the VarDecl. This caused a cascade: `IdentifyTarget()`
classified the variable as host-side, and any device code referencing it
got a spurious "reference to __host__ variable in __device__ function"
error.

Downgrade the error to a default-ignored warning (`-Wcuda-extern-shared`)
and always add `CUDASharedAttr` so the variable is correctly classified as
device-side. The old `err_cuda_extern_shared` is preserved for potential
future use.
DeltaFile
+40-9clang/test/SemaCUDA/extern-shared.cu
+4-3clang/lib/Sema/SemaDeclAttr.cpp
+4-0clang/include/clang/Basic/DiagnosticSemaKinds.td
+48-123 files

LLVM/project 342e446llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

[AMDGPU][SIInsertWaitcnts] Move VCCZ workaround code out of the way (#182619)

This is a cleanup patch that moves the VCCZ specific workaround code
from `SIInsertWaitcnts::insertWaitcntInBlock()` to a separate class and
refactors it a bit to make it easier to read.
The end result is a simpler `insertWaitcntInBlock()`.

Should be NFC.
DeltaFile
+107-61llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+107-611 files

LLVM/project 795cfaeclang/test/CIR/CodeGen builtin-floating-point.c object-size.c, clang/test/CIR/CodeGenBuiltins builtins-floating-point.c builtin-object-size.c

[CIR][NFC] Move some builtin tests to the CodeGenBuitins folder (#183607)

This moves a few tests that were created in the wrong location. Also
changes the names of some test files to maintain consistency.
DeltaFile
+2,176-54clang/test/CIR/CodeGenBuiltins/builtins-floating-point.c
+0-2,212clang/test/CIR/CodeGen/builtin-floating-point.c
+877-0clang/test/CIR/CodeGenBuiltins/builtin-object-size.c
+0-877clang/test/CIR/CodeGen/object-size.c
+636-0clang/test/CIR/CodeGenBuiltins/builtin-bit.cpp
+0-636clang/test/CIR/CodeGenBuiltins/builtin_bit.cpp
+3,689-3,77924 files not shown
+5,226-5,41430 files

LLVM/project 085569bllvm/lib/Transforms/InstCombine InstCombineSelect.cpp, llvm/test/Transforms/InstCombine select-and-or.ll

Fix profile metadata propagation in InstCombine select folding

Propagate profile metadata when folding select instructions with logical AND/OR conditions and when canonicalizing SPF to intrinsics. This fixes profile verification failures in Transforms/InstCombine/select-and-or.ll.
DeltaFile
+75-25llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+21-20llvm/test/Transforms/InstCombine/select-and-or.ll
+0-1llvm/utils/profcheck-xfail.txt
+96-463 files

LLVM/project 12e1075llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer extract-many-users-buildvector.ll

[SLP]Fix operand reordering when estimating profitability of operands

Need to swap operand for a single instruction, not for the the same lane
of the first and second instruction in the list
DeltaFile
+10-10llvm/test/Transforms/SLPVectorizer/extract-many-users-buildvector.ll
+10-4llvm/test/Transforms/SLPVectorizer/X86/non-schedulable-node-with-non-schedulable-parent.ll
+5-6llvm/test/Transforms/SLPVectorizer/X86/bv-root-part-of-graph.ll
+6-5llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+5-5llvm/test/Transforms/SLPVectorizer/X86/split-vectorize-gathered-def-after-use.ll
+36-305 files

LLVM/project fd9421clldb/examples/python formatter_bytecode.py

[lldb] Fix sys.path manipulation failure in formatter_bytecode.py (#183868)

Fix bug in #183804.
DeltaFile
+2-1lldb/examples/python/formatter_bytecode.py
+2-11 files