LLVM/project fac333ellvm/lib/Target/AArch64 AArch64A57FPLoadBalancing.cpp

[AArch64] Do not pass debug insn to liveness analysis (#198021)

Fix another stepBackward location.

Debug instructions must not affect liveness analysis. stepBackward has
an assertion failure on debug instructions after
https://github.com/llvm/llvm-project/pull/193104.

Signed-off-by: John Lu <John.Lu at amd.com>
DeltaFile
+2-1llvm/lib/Target/AArch64/AArch64A57FPLoadBalancing.cpp
+2-11 files

LLVM/project ae2d83bllvm/test/tools/llvm-mca/RISCV/SiFiveP800 vlseg-vsseg.s, llvm/test/tools/llvm-mca/RISCV/SiFiveP800/rvv arithmetic.test fp.test

[RISCV][MCA] Use the new infrastructure for SiFive P500 and P800's tests. NFC (#198016)

Some tests -- mostly vector crypto -- are kept for SiFive P800.

NFC.
DeltaFile
+0-4,752llvm/test/tools/llvm-mca/RISCV/SiFiveP800/vlseg-vsseg.s
+4,549-0llvm/test/tools/llvm-mca/RISCV/SiFiveP800/rvv/arithmetic.test
+3,729-0llvm/test/tools/llvm-mca/RISCV/SiFiveP800/rvv/fp.test
+3,149-0llvm/test/tools/llvm-mca/RISCV/SiFiveP800/rvv/vlseg-vsseg.test
+2,901-0llvm/test/tools/llvm-mca/RISCV/SiFiveP800/rvv/bitwise.test
+2,357-0llvm/test/tools/llvm-mca/RISCV/SiFiveP800/rvv/permutation.test
+16,685-4,75253 files not shown
+31,123-10,18259 files

LLVM/project b5406e4flang/test/Lower array-derived.f90 allocatable-runtime.f90

[flang][NFC] Finishing touches on legacy lowering conversion (#197973)

At the beginning of legacy lowering conversion, some tests were
initially converted to emit FIR. After some discussion, it was decided
to revisit those tests and convert them to emit HLFIR. This change
completes that step and should be the final change in removing vestiges
of legacy lowering.

Assisted-by: AI
DeltaFile
+42-66flang/test/Lower/array-derived.f90
+55-52flang/test/Lower/allocatable-runtime.f90
+42-56flang/test/Lower/array-constructor-index.f90
+47-47flang/test/Lower/allocate-source-allocatables.f90
+30-28flang/test/Lower/allocatable-return.f90
+26-26flang/test/Lower/arithmetic-goto.f90
+242-2755 files not shown
+294-31711 files

LLVM/project 7db1a2blldb/source/Utility ConstString.cpp

[lldb] Avoid unnecessary strlen of mangled names in ConstString (NFC) (#197995)

C++ mangled names are known to be quite long at times. This change makes
use of available length data, instead of using the `StringRef(const char
*)` constructor which calls `strlen`.

The main detail is to replace `selectPool(llvm::StringRef(raw))` with a
call to `selectPool` using a readily available StringRef.
DeltaFile
+11-9lldb/source/Utility/ConstString.cpp
+11-91 files

LLVM/project aaaae52libc/test/integration/src/__support/threads cndvar_test.cpp, libc/test/integration/src/pthread pthread_cond_test.cpp

[libc] Reduce number of iterations in threading tests. (#198030)

Previously the threading tests were running noticeably slowly and
causing flakey timeouts on some buildbots (e.g.
https://lab.llvm.org/buildbot/#/builders/71/builds/48420)
DeltaFile
+3-3libc/test/integration/src/pthread/pthread_cond_test.cpp
+2-2libc/test/integration/src/__support/threads/cndvar_test.cpp
+5-52 files

LLVM/project dd0bb3eclang/lib/CIR/CodeGen CIRGenCXX.cpp, clang/test/CIR/CodeGen global-dtor-union-narrowed.cpp

[CIR] Cast global var address to declared type at dtor call site

A C++ global with a constexpr default constructor that fixes the active member of a union — `std::basic_string`'s SSO `__short` variant is a common example — has a `cir.global` whose stored record type is the narrowed shape of that active variant.  Classic CodeGen does the same (`@g = global { { { [16 x i8] } } } zeroinitializer`) and accepts the resulting `__cxa_atexit(@D1, @g, ...)` because LLVM IR uses opaque pointers.  CIR has typed pointers, so the `cir.call` registering the destructor for `__cxa_atexit` carries an operand type that doesn't match the dtor's `this` parameter.  This trips 16 libcxx tests and 71 cases total across libcxx, MultiSource, SingleSource, and SPEC in our build.

`verifyPointerTypeArgs(oldF, newF, userMap)` in `CIRGenModule::applyReplacements` (`clang/lib/CIR/CodeGen/CIRGenModule.cpp:1700`) catches this when ctor-dtor aliases are enabled and D1 is RAUW'd by D2.  Without aliases, the `cir.call` op verifier rejects the same operand-type mismatch directly.

The fix mirrors the cast pattern `emitGlobalVarDeclLValue` (`clang/lib/CIR/CodeGen/CIRGenExpr.cpp:441-445`) already uses for every AST-level reference to a global: bitcast the result of `getAddrOfGlobalVar` to `convertTypeForMem(type)` before any typed-pointer op consumes it.  `getAddrOfGlobalVar` itself stays raw so callers that walk to the underlying `GetGlobalOp` via `getDefiningOp()` keep working.

`global-dtor-union-narrowed.cpp` pins the CIR bitcast, the lowered LLVM helper-wrapped `__cxa_atexit`, and the equivalent OGCG direct `__cxa_atexit`.
DeltaFile
+41-0clang/test/CIR/CodeGen/global-dtor-union-narrowed.cpp
+14-1clang/lib/CIR/CodeGen/CIRGenCXX.cpp
+55-12 files

LLVM/project 2825dfaclang/test/CodeGen scoped-atomic-ops.c, clang/test/CodeGenCUDA atomic-options.hip amdgpu-kernel-arg-pointer-type.cu

[clang] remove lots of "innocuous" addrspacecasts (#197745)

These originally added many addrspacecast early on, where often it
wasn't needed, or could be added later. This makes these fairly
straightforward to remove (other than changing some tests). By swapping
all calls to this function (except the intended semantic ones for
parameters and variables) with the uncasted version, AMDGPU will
eventually not need to attempt to apply a fix up afterwards by having
different addrspace maps. This PR does not yet fix all calls, but the
main ones that might have been missed are in matrix/vector extensions
(which seem to weirdly override the memory type for temporary values to
be different from the type of the object in all other uses).
DeltaFile
+568-852clang/test/CodeGen/scoped-atomic-ops.c
+144-216clang/test/CodeGenCUDA/atomic-options.hip
+95-103clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu
+60-41clang/test/CodeGenCXX/amdgcn-func-arg.cpp
+36-54clang/test/CodeGenCUDA/builtins-spirv-amdgcn.cu
+32-42clang/test/OpenMP/target_teams_generic_loop_codegen_as_parallel_for.cpp
+935-1,30834 files not shown
+1,156-1,56040 files

LLVM/project ad42ae2llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h, llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

Fix dynamic map iterator target data lowering

Hoist runtime-sized offload map array allocation for regional target data with
iterator modifiers so the dynamic count and arrays dominate both begin and end
runtime calls.
DeltaFile
+58-30llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+28-2offload/test/offloading/fortran/map-motion-iterator.f90
+28-0mlir/test/Target/LLVMIR/openmp-iterator.mlir
+6-0llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+2-2mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+122-345 files

LLVM/project 97ce93allvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/AMDGPU transform-node-gather-struct.ll

[SLP]Consider non-profitable trees with buildvector of struct-returning instructions

Dropping the tree with the struct-returning instructions after
transformations to fix a compiler crash in
https://lab.llvm.org/buildbot/#/builders/10/builds/28684.

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/198024
DeltaFile
+49-0llvm/test/Transforms/SLPVectorizer/AMDGPU/transform-node-gather-struct.ll
+12-0llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+61-02 files

LLVM/project e7f80d6libc/src/stdio/printf_core float_dec_converter_limited.h

[libc] Fix shadowing in printf (#197985)

The 320 bit float converter defined StorageType and DECIMAL_POINT
outside of its functions. This caused issues with other definitions of
the same variables after #197516.
DeltaFile
+4-10libc/src/stdio/printf_core/float_dec_converter_limited.h
+4-101 files

LLVM/project e8daf91llvm/lib/DWARFLinker/Parallel DWARFLinkerCompileUnit.cpp TypePool.h, llvm/test/tools/dsymutil/X86/DWARFLinkerParallel odr-member-functions.cpp odr-fwd-declaration2.test

[DWARFLinker] Preserve source order of member subprograms (#196443)

Children of class/struct/union/interface DIEs in the parallel
DWARFLinker's artificial type unit are sorted lexicographically by the
TypePool synthetic-name key. Data members already get a positional slot
through the synthetic name, but subprograms don't: they collapse to
alphabetical-by-linkage-name order. That breaks LLDB's
SBType::GetMemberFunctionAtIndex(N), which contractually returns members
in DWARF order.

Add a uint32_t SortKey on TypeEntryBody, atomically min-merged across
CUs with the input DIE's ordinal in its parent's child list, and consult
it before the synthetic-name key in TypePool's comparator. The ordinal
is computed by cloneDIE's existing child walk and threaded into
createTypeDIEandCloneAttributes. Scoped to children of
class/struct/union/interface so top-level types in the artificial type
unit keep their existing sort order.
DeltaFile
+17-17llvm/test/tools/dsymutil/X86/DWARFLinkerParallel/odr-member-functions.cpp
+26-5llvm/lib/DWARFLinker/Parallel/DWARFLinkerCompileUnit.cpp
+15-0llvm/lib/DWARFLinker/Parallel/TypePool.h
+6-6llvm/test/tools/dsymutil/X86/DWARFLinkerParallel/odr-fwd-declaration2.test
+7-3llvm/lib/DWARFLinker/Parallel/DWARFLinkerCompileUnit.h
+2-2llvm/test/tools/dsymutil/X86/DWARFLinkerParallel/odr-static-member-decl.test
+73-331 files not shown
+75-357 files

LLVM/project bd0c8fdllvm/test/Transforms/SLPVectorizer/X86 arith-mul-smulo.ll arith-sub-ssubo.ll

Revert "[SLP] Vectorize struct-returning intrinsics"

This reverts commit 1c5e395e234b5c4c6048a51842480c0c074f6ccf.
DeltaFile
+615-549llvm/test/Transforms/SLPVectorizer/X86/arith-mul-smulo.ll
+615-449llvm/test/Transforms/SLPVectorizer/X86/arith-sub-ssubo.ll
+615-449llvm/test/Transforms/SLPVectorizer/X86/arith-sub-usubo.ll
+615-449llvm/test/Transforms/SLPVectorizer/X86/arith-add-saddo.ll
+615-449llvm/test/Transforms/SLPVectorizer/X86/arith-add-uaddo.ll
+615-429llvm/test/Transforms/SLPVectorizer/X86/arith-mul-umulo.ll
+3,690-2,7745 files not shown
+3,913-3,29011 files

LLVM/project 4441ff0llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt gfx12_dasm_vop3.txt

[AMDGPU] Allow printing i16 imm as f16 inline constant

This allows diasm to look the same way as asm and codegen.
DeltaFile
+228-228llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+200-200llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3.txt
+200-200llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3.txt
+194-194llvm/test/MC/Disassembler/AMDGPU/gfx10_vop3.txt
+144-144llvm/test/MC/Disassembler/AMDGPU/gfx10_vop3c.txt
+128-128llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3cx.txt
+1,094-1,09474 files not shown
+3,445-3,58980 files

LLVM/project 4f9a7d0clang/include/clang/Lex PPCallbacks.h Preprocessor.h, clang/lib/Lex Preprocessor.cpp

[clang][DependencyScanning] Preserve Necessary Preprocessor Callbacks during By-name Lookup (#197731)

The by-name lookup logic uses new dependency collector callbacks per
lookup. The algorithm used to wipe out all callbacks for each query.
This turned out to be perilous. We have two raw pointers in the
preprocessor that point to the callbacks, and removing all callbacks per
query can lead to use-after-free situations through these dangling
pointers. Resetting the dangling pointers to null does not really work
either, since there may be dependencies between the callbacks and other
data structures. An example of this is the `PreprocessingRecord *Record`
callback and the `GlobalPreprocessedEntityMap` in ASTReader. Hence, to
fix the use-after-free issue, we preserve the callbacks that the
preprocessor may hold a raw pointer to.

This is not intended to indicate how we want to handle this in the long
run. We should avoid removing PP callbacks and reset their states across
by-name lookups.

rdar://175362366
DeltaFile
+41-0clang/test/ClangScanDeps/modules-by-name-detailed-preprocessing-record.c
+29-0clang/include/clang/Lex/PPCallbacks.h
+11-0clang/lib/Lex/Preprocessor.cpp
+1-1clang/include/clang/Lex/Preprocessor.h
+82-14 files

LLVM/project e92c0d2llvm/test/CodeGen/MIR/AMDGPU parse-cfi-unsigned-error.mir

[AMDGPU] Drop target requirements in test (#198015)

These were only necessary when the test was in the wrong folder. Now
that the test is in the right folder, it will only be marked as
supported when AMDGPU is enabled as a target, so the additional
requirement in the test is redundant.
DeltaFile
+0-2llvm/test/CodeGen/MIR/AMDGPU/parse-cfi-unsigned-error.mir
+0-21 files

LLVM/project f0adfabllvm/test/Transforms/SLPVectorizer/X86 scalarize-ctlz.ll arith-fp-inseltpoison.ll

[SLP] Preserve profitable trees when subtree trimming would reduce to buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Reviewers: RKSimon, hiraditya, bababuck

Pull Request: https://github.com/llvm/llvm-project/pull/197763
DeltaFile
+48-29llvm/test/Transforms/SLPVectorizer/X86/scalarize-ctlz.ll
+19-32llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll
+19-32llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll
+9-10llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
+7-10llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll
+7-10llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll
+109-1234 files not shown
+138-14010 files

LLVM/project c8dcb79clang/test/SemaCXX warn-unsafe-buffer-usage-pragma-issue-79379.cpp warn-unsafe-buffer-usage-pragma-diagnostic.cpp

rename the test for a github issue with the issue number
DeltaFile
+25-0clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-issue-79379.cpp
+0-25clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-diagnostic.cpp
+25-252 files

LLVM/project bf7d6fellvm/utils/gn/secondary/llvm/lib/MC BUILD.gn

[gn build] Port ca6e386cbf5b (#198009)
DeltaFile
+1-0llvm/utils/gn/secondary/llvm/lib/MC/BUILD.gn
+1-01 files

LLVM/project e0b3a79llvm/utils/gn/secondary/libcxx/include BUILD.gn

[gn] port 597e4ac7fbdd, 2nd attempt (#198008)
DeltaFile
+1-2llvm/utils/gn/secondary/libcxx/include/BUILD.gn
+1-21 files

LLVM/project c5d66ebllvm/utils/gn/secondary/libcxx/include BUILD.gn

Revert "[gn] port 597e4ac7fbdd (#198002)" (#198007)

This reverts commit 845dd45d82dac9902bd5665f7ac8f276e218df20.

I merged this incorrectly. Let's revert and try again.
DeltaFile
+2-1llvm/utils/gn/secondary/libcxx/include/BUILD.gn
+2-11 files

LLVM/project cbc5657llvm/test/CodeGen/SPIRV/debug-info debug-type-function-pointer-param.ll debug-type-function-int-string-dedup.ll

[reviews] Simplify tests.
DeltaFile
+7-7llvm/test/CodeGen/SPIRV/debug-info/debug-type-function-pointer-param.ll
+7-7llvm/test/CodeGen/SPIRV/debug-info/debug-type-function-int-string-dedup.ll
+7-7llvm/test/CodeGen/SPIRV/debug-info/debug-type-function-scalar-returns.ll
+6-6llvm/test/CodeGen/SPIRV/debug-info/debug-type-function-void-prototypes.ll
+1-1llvm/test/CodeGen/SPIRV/debug-info/debug-type-function-multi-scalar-params.ll
+1-1llvm/test/CodeGen/SPIRV/debug-info/debug-type-function-subroutine-type-flags.ll
+29-292 files not shown
+31-318 files

LLVM/project 251eba7llvm/utils/gn/secondary/llvm/lib/Transforms/IPO BUILD.gn

[gn build] Port b225b33c925c (#198004)
DeltaFile
+1-0llvm/utils/gn/secondary/llvm/lib/Transforms/IPO/BUILD.gn
+1-01 files

LLVM/project 845dd45llvm/utils/gn/secondary/libcxx/include BUILD.gn

[gn] port 597e4ac7fbdd (#198002)
DeltaFile
+1-2llvm/utils/gn/secondary/libcxx/include/BUILD.gn
+1-21 files

LLVM/project 18c6e26clang/test/SemaCXX warn-unsafe-buffer-usage-template-instantiation-notes.cpp warn-unsafe-buffer-usage-pragma-diagnostic.cpp

- Add test for rdar://107480207
- Add test for https://github.com/llvm/llvm-project/issues/79379
DeltaFile
+47-0clang/test/SemaCXX/warn-unsafe-buffer-usage-template-instantiation-notes.cpp
+25-0clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-diagnostic.cpp
+72-02 files

LLVM/project 9244772llvm/test/MC/Disassembler/AMDGPU gfx11_dasm_vop3.txt gfx12_dasm_vop3.txt

[AMDGPU] Use shorter form for i16 operands

For 16-bit operands an inline constant is zero extended
which in particular allows to use FP constants. These
will have 16 bits of zeroes in the high half and FP16
value in the low 16 bits.
DeltaFile
+200-200llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3.txt
+200-200llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3.txt
+116-116llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+98-98llvm/test/MC/Disassembler/AMDGPU/gfx10_vop3.txt
+96-96llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vopc.txt
+96-96llvm/test/MC/Disassembler/AMDGPU/gfx10_vop3c.txt
+806-80655 files not shown
+2,169-2,16661 files

LLVM/project a96572aflang-rt/lib/runtime findloc.cpp

[flang-rt] Rework findloc.cpp to dispatch target at runtime (#197756)

Summary:
The previous code had a combinatorial explosion of functions by
templating on both the source and target types. This created around 170
instantiations. Instead we just template on the source type and then use
a simple runtime check. This should not affect performance in a
significant way, it introduces maybe a few branches in what is already a
non-trivial operation that I do not think justifies a two-minute compile
time.

The result is that this file goes from 120 seconds to 12 on my machine
and the resulting file goes from 7.2 MiB to 757 kiB. Functinally this
makes us instantiate 1/10th the functions.
DeltaFile
+110-68flang-rt/lib/runtime/findloc.cpp
+110-681 files

LLVM/project 62a0704llvm/lib/Target/SPIRV SPIRVNonSemanticDebugHandler.cpp, llvm/test/CodeGen/SPIRV/debug-info debug-type-pointer-composite-pointee.ll

[reviews] Simplify code and add missing test.
DeltaFile
+38-0llvm/test/CodeGen/SPIRV/debug-info/debug-type-pointer-composite-pointee.ll
+9-11llvm/lib/Target/SPIRV/SPIRVNonSemanticDebugHandler.cpp
+47-112 files

LLVM/project 13dc18allvm/test/tools/dsymutil/AArch64 cas-config.test, llvm/tools/dsymutil DwarfLinkerForBinary.cpp

[dsymutil] Collect .cas-config files in dSYM bundles (#197818)

When caching is enabled the Swift compiler might substitute CAS
identifiers for on-disk paths. In order to resolve them the build system
puts a .cas-config file in the build directory. Dsymutil needs to
collect the contents of these files so tools consuming the dSYM (which
do not have access to the original build directory) can resolve these
CAS identifiers, too.

Assisted-by: claude

rdar://169986664
DeltaFile
+50-2llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
+36-0llvm/test/tools/dsymutil/AArch64/cas-config.test
+86-22 files

LLVM/project e5932b7llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h, llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

[OpenMP][mlir] Support OpenMP map iterator LLVM lowering

Lower iterator modifiers on map and motion clauses by building
dynamic offload map arrays in OpenMPIRBuilder and populating them
from iterator-expanded map entries during MLIR OpenMP to LLVM IR
translation.

This enables target enter/exit data, target update, and target data
forms to carry iterator-generated map entries through runtime calls,
including mapper and debug-name arrays. Add MLIR translation coverage
and an AMDGPU Fortran offload runtime test that checks iterator
copy-back behavior for update, exit data, and target data.

This patch is part of feature work for #188061.

Assisted with copilot.
DeltaFile
+210-15llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+189-0mlir/test/Target/LLVMIR/openmp-iterator.mlir
+155-10mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+93-0offload/test/offloading/fortran/map-motion-iterator.f90
+18-64mlir/test/Target/LLVMIR/openmp-todo.mlir
+28-4llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+693-936 files

LLVM/project 91026ccllvm/lib/Target/AMDGPU SIISelLowering.cpp SIInstructions.td, llvm/test/CodeGen/AMDGPU fcanonicalize.bf16.ll packed-fneg-fsub-bf16.ll

[AMDGPU] Optimize fcanonicalize/fneg/fsub with packed bf16 math ops (#197318)

  This work makes fcanonicalize v2bf16 'Legal' and implements the
selection pattern for it with v_pk_mul_bf16.

  We also make fneg and fabs 'Legal' in this patch. With this change,
packed fadd can be selected for vector fsub with bf16. Also, the vector
fneg can be successfully folded into the operand in the packed bf16 math ops.
DeltaFile
+197-580llvm/test/CodeGen/AMDGPU/fcanonicalize.bf16.ll
+533-0llvm/test/CodeGen/AMDGPU/packed-fneg-fsub-bf16.ll
+9-30llvm/test/CodeGen/AMDGPU/bf16.ll
+6-7llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+6-0llvm/lib/Target/AMDGPU/SIInstructions.td
+751-6175 files