LLVM/project e13cb33llvm/test/CodeGen/AMDGPU directive-amdgcn-target-legacy-triples.ll directive-amdgcn-target.ll

AMDGPU: Migrate target id tests to use new subarch triples
DeltaFile
+239-0llvm/test/CodeGen/AMDGPU/directive-amdgcn-target-legacy-triples.ll
+0-239llvm/test/CodeGen/AMDGPU/directive-amdgcn-target.ll
+11-11llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-1.ll
+12-10llvm/test/CodeGen/AMDGPU/target-id-xnack-always-on.ll
+11-11llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-not-supported.ll
+11-11llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-off.ll
+284-2829 files not shown
+380-37815 files

LLVM/project 3c0de55llvm/unittests/CodeGen AMDGPUMetadataTest.cpp, llvm/unittests/CodeGen/GlobalISel GISelMITest.cpp

AMDGPU: Migrate unittests to subarch triples

Replace specifying a processor name with the triple
subarch.

The register-limit helpers in AMDGPUUnitTests.cpp that enumerate every
valid CPU via fillValidArchListAMDGCN still pass the CPU explicitly, as
does the MC Disassembler smoke test (its C disassembler API derives the
subtarget from the CPU, not the triple subarch).

Co-authored-by: Claude (Opus 4.8) <noreply at anthropic.com>
DeltaFile
+6-6llvm/unittests/Target/AMDGPU/DwarfRegMappings.cpp
+6-6llvm/unittests/MC/AMDGPU/DwarfRegMappings.cpp
+3-3llvm/unittests/CodeGen/AMDGPUMetadataTest.cpp
+2-2llvm/unittests/MIR/MachineMetadata.cpp
+2-2llvm/unittests/CodeGen/GlobalISel/GISelMITest.cpp
+2-2llvm/unittests/MC/AMDGPU/Disassembler.cpp
+21-2110 files not shown
+33-3316 files

LLVM/project 2938114clang/lib/Driver/ToolChains CommonArgs.cpp, clang/test/Driver amdgpu-mcpu.cl hip-sanitize-options.hip

clang/AMDGPU: Stop passing redundant -target-cpu to cc1

Now that the exact target is encoded in the triple's subarch field,
-target-cpu is redundant. This avoids polluting the resultant IR with
unwanted "target-cpu" attributes. The net result is the desired codegen
when compiling libraries for a major subarch and linking it into a
program compiled for a specific arch. e.g., compiling for "gfx9-generic"
would pollute the IR with "target-cpu"="gfx9-generic", so codegen
would ultimately be performed for the generic target even after
linking into the concrete gfx9 cpu. The specialization will now be
achieved by merging the triples without the linker or optimization
passes needing to fixup function attributes.
DeltaFile
+62-62clang/test/Driver/amdgpu-mcpu.cl
+26-26clang/test/Driver/hip-sanitize-options.hip
+20-10clang/lib/Driver/ToolChains/CommonArgs.cpp
+12-16clang/test/Driver/hip-rdc-device-only.hip
+24-0clang/test/Preprocessor/amdgpu-subarch-cc1-target-cpu.cl
+10-10clang/test/Driver/amdgpu-xnack-sramecc-flags.c
+154-12427 files not shown
+214-21133 files

LLVM/project 75afb66clang/lib/Basic OffloadArch.cpp, clang/lib/Driver Driver.cpp

clang: Start using new amdgpu subarch triples

Fixup invocations using --target=amdgcn + -mcpu to introduce
the subarch in the triple.

For offload toolchains, a single toolchain is constructed for the
top level amdgpu architecture, and the effective triple is used for
target specific tool invocations.

The specifics of the resource directory layout are tbd. This does
try to find resources in the subarch named directory. The paths
are searched at toolchain creation time, so that does not work
when there are multiple subarches.

Fixes #154925
DeltaFile
+230-2clang/lib/Basic/OffloadArch.cpp
+59-59clang/test/Driver/offload-arch-translation-amdgpu.cu
+43-43clang/test/Driver/hip-phases.hip
+33-33clang/test/Driver/hip-binding.hip
+49-15clang/lib/Driver/ToolChains/CommonArgs.cpp
+43-12clang/lib/Driver/Driver.cpp
+457-164102 files not shown
+1,246-490108 files

LLVM/project 0df7caaclang/lib/Basic/Targets AMDGPU.h AMDGPU.cpp, clang/test/Misc/target-invalid-cpu-note amdgcn.c

clang/AMDGPU: Validate -target-cpu in cc1 is valid for the subarch

Restrict the reported list of valid target-cpus based on the triple's
subarch. This is more consistent with how other targets validate the
target CPU name. Currently we have split handling validating the target
name for the triple in both the driver and here. The driver based diagnostic
seems to be an amdgpu-ism in 2 different places (though there is one arm
validation emitting the same diagnostic). In the future we could probably
drop those.
DeltaFile
+55-0clang/test/Misc/target-invalid-cpu-note/amdgcn.c
+6-5clang/lib/Basic/Targets/AMDGPU.h
+1-1clang/lib/Basic/Targets/AMDGPU.cpp
+62-63 files

LLVM/project c5b65e4llvm/test/CodeGen/AMDGPU target-cpu.ll

AMDGPU: Rewrite target-cpu test for new subarches

The function subtargets should now be a valid subtarget for
the top-level subarch.
DeltaFile
+52-74llvm/test/CodeGen/AMDGPU/target-cpu.ll
+52-741 files

LLVM/project 3839214llvm/docs AMDGPUUsage.rst, llvm/lib/TargetParser AMDGPUTargetParser.cpp Triple.cpp

AMDGPU: Introduce amdgpu triple arch

Move towards using the triple for representing incompatible
ISA changes. Use the subarch field to represent the various
incompatible cases. Previously we pretended a single triple arch
was universally compatible, and only distinguished by function
level subtargets. Move towards using distinct triples to enable
more sophisticated toolchain handling in the future, like proper
runtime library linking.

Introduce a new subarch per unique ISA, but also introduce
"major subarches" which are compatible by a set of covered
minor ISA versions. These map to the existing generic targets.
There are a few placeholder subarch entries, which currently
have missing backing generic arches for codegen.

This should be the preferred triple arch name going forward,
but is treated as an alias of amdgcn. This does not yet change
clang to emit the new triples.

    [2 lines not shown]
DeltaFile
+548-434llvm/docs/AMDGPUUsage.rst
+417-0llvm/unittests/TargetParser/TargetParserTest.cpp
+241-11llvm/lib/TargetParser/AMDGPUTargetParser.cpp
+177-0llvm/test/CodeGen/AMDGPU/target-id-from-triple.ll
+149-14llvm/lib/TargetParser/Triple.cpp
+138-12llvm/unittests/TargetParser/TripleTest.cpp
+1,670-47172 files not shown
+2,500-67778 files

LLVM/project 08e60e2llvm/include/llvm/IR Instructions.h, llvm/lib/CodeGen AtomicExpandPass.cpp

Update for comments
DeltaFile
+18-18llvm/include/llvm/IR/Instructions.h
+4-4llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+2-2llvm/lib/CodeGen/AtomicExpandPass.cpp
+2-2llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
+26-264 files

LLVM/project 12b20b3llvm/test/CodeGen/X86 phaddsub-extract.ll, llvm/test/Transforms/PhaseOrdering/X86 horizontal-reduce-add.ll

[X86] Move more vector.reduce.add subvector pattern tests to PhaseOrdering/X86/horizontal-reduce-add.ll (#206467)

CodeGen test coverage is already in vector-reduce-add-subvector.ll
DeltaFile
+0-86llvm/test/CodeGen/X86/phaddsub-extract.ll
+32-0llvm/test/Transforms/PhaseOrdering/X86/horizontal-reduce-add.ll
+32-862 files

LLVM/project 195ca1fllvm/test/CodeGen/SystemZ vector-constrained-fp-intrinsics.ll knownbits-intrinsics-binop.ll

[SystemZ] Limit latency scheduling to SUs with latency of at least 5. (#206459)

The latency reduction heuristic is highly effective, but it seems preferred
to not "move everything around", but rather focus on instructions that have
somewhat longer latencies. The basic idea behind this is that the input
order is fairly good to begin with and not just "any random order", so it
should not be disturbed unnecessarily.
DeltaFile
+225-241llvm/test/CodeGen/SystemZ/vector-constrained-fp-intrinsics.ll
+120-119llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll
+99-99llvm/test/CodeGen/SystemZ/vec-cmp-cmp-logic-select.ll
+72-72llvm/test/CodeGen/SystemZ/shift-17.ll
+46-46llvm/test/CodeGen/SystemZ/shift-16.ll
+42-42llvm/test/CodeGen/SystemZ/store_nonbytesized_vecs.ll
+604-61941 files not shown
+1,114-1,14047 files

LLVM/project ad2fbealibcxx/include print, libcxx/include/__atomic atomic_sync_timed.h

[libc++] Consistently use version guards outside namespace macros (#181136)

This has the benefit of consistent placement for these macros (outside
the namespace, not inside it). This, in turn, makes it more obvious that
the condition applies to the entire namespace scope. Compile times are
also reduced a bit, since the compiler doesn't have to open a namespace
just to close it again.

This is enforced with a clang-tidy check.
As a drive-by this also fixes a few macro comments.
DeltaFile
+28-0libcxx/test/tools/clang_tidy_checks/empty_namespaces.cpp
+18-0libcxx/test/tools/clang_tidy_checks/empty_namespaces.hpp
+8-8libcxx/include/print
+7-6libcxx/include/__atomic/atomic_sync_timed.h
+6-6libcxx/include/__bit/bit_cast.h
+6-6libcxx/include/__memory/uses_allocator_construction.h
+73-26151 files not shown
+668-603157 files

LLVM/project 09d1b94llvm/lib/Transforms/Scalar MemCpyOptimizer.cpp, llvm/test/Transforms/MemCpyOpt memset-memmove-redundant-memmove.ll

[MemCpyOpt] Fix incorrect size check in memmove of memset opt (#206451)

We were only checking that the memset is at least as large as the
memmove size, but not accounting for the fact that the memmove occurs at
an offset.
DeltaFile
+14-0llvm/test/Transforms/MemCpyOpt/memset-memmove-redundant-memmove.ll
+2-1llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+16-12 files

LLVM/project c4dafb0libcxx/docs/Status Cxx26Issues.csv, libcxx/test/std/localization/locale.categories/category.numeric/locale.nm.put/facet.num.put.members put_double.pass.cpp put_long_double.pass.cpp

[libc++] Mark LWG4084 as resolved (#206224)

Closes https://github.com/llvm/llvm-project/issues/118346
DeltaFile
+50-0libcxx/test/std/localization/locale.categories/category.numeric/locale.nm.put/facet.num.put.members/put_double.pass.cpp
+50-0libcxx/test/std/localization/locale.categories/category.numeric/locale.nm.put/facet.num.put.members/put_long_double.pass.cpp
+1-1libcxx/docs/Status/Cxx26Issues.csv
+101-13 files

LLVM/project 2852b8bllvm/lib/Target/SPIRV SPIRVRegularizer.cpp, llvm/test/CodeGen/SPIRV/passes SPIRVRegularizer-i1-icmp.ll

[SPIR-V] Extend runLowerI1Comparisons to cover vector i1 types (#206409)
DeltaFile
+20-0llvm/test/CodeGen/SPIRV/passes/SPIRVRegularizer-i1-icmp.ll
+1-1llvm/lib/Target/SPIRV/SPIRVRegularizer.cpp
+21-12 files

LLVM/project bb9f081cross-project-tests/debuginfo-tests/dexter/dex/evaluation ExpectMatch.py RunMatch.py, cross-project-tests/debuginfo-tests/dexter/dex/test_script Nodes.py

[Dexter] Add !address node (#202801)

Adds a node type for Dexter that allows checking abstract labels instead
of concrete addresses. Each address node has a label and optional
offset, and the first time during evaluation that a given address label
is matched against a valid pointer value, the address label will be
assigned a value that matches the seen address (adjusting for any
offset). From that point, the resolved address value will be used for
the remainder of the test evaluation.
DeltaFile
+138-50cross-project-tests/debuginfo-tests/dexter/dex/evaluation/ExpectMatch.py
+66-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/evaluation/eval_address.cpp
+46-0cross-project-tests/debuginfo-tests/dexter/dex/test_script/Nodes.py
+26-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/parser/invalid-address.test
+15-6cross-project-tests/debuginfo-tests/dexter/dex/evaluation/RunMatch.py
+15-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/parser/parse-address.test
+306-566 files

LLVM/project c06d047clang/docs ReleaseNotes.rst, clang/include/clang/Basic DiagnosticSemaKinds.td

Diagnose noreturn calls from a const or pure function (#206134)

The const and pure functions add the WillReturn LLVM IR attribute which
require the function to return. Calling a noreturn function is UB, so it
is now being diagnosed unless the call is known to be unevaluated.

This diagnostic is enabled by default.

Fixes #129022
DeltaFile
+106-1clang/test/Sema/attr-const-pure.c
+15-0clang/lib/Sema/SemaChecking.cpp
+7-0clang/include/clang/Basic/DiagnosticSemaKinds.td
+3-1clang/docs/ReleaseNotes.rst
+131-24 files

LLVM/project 066b689llvm/include/llvm/IR Instructions.h, llvm/lib/CodeGen AtomicExpandPass.cpp

[IR][NFC] Add LoadStoreInstAttributes to copy load/store attrs
DeltaFile
+34-0llvm/include/llvm/IR/Instructions.h
+4-6llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+2-6llvm/lib/CodeGen/AtomicExpandPass.cpp
+2-2llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
+42-144 files

LLVM/project 2fbe136clang/docs ReleaseNotes.rst, clang/lib/Frontend FrontendAction.cpp

Line and digit directives, OriginalFileName, ModuleName should be unevaluated strings (#201413)

Based on
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2361r6.pdf,
line and digit directives should be unevaluated strings. This patch
changes the HandleLineDirective and HandleDigitDirective to parse
strings as unevaluated string literals and fixes the testcase to not
have escape sequences.
DeltaFile
+11-0clang/docs/ReleaseNotes.rst
+4-2clang/lib/Lex/PPDirectives.cpp
+5-0clang/test/Frontend/linemarker-invalid-escape.c
+5-0clang/test/Preprocessor/line-directive.c
+2-1clang/lib/Frontend/FrontendAction.cpp
+1-1clang/test/Preprocessor/line-directive-output.c
+28-42 files not shown
+31-58 files

LLVM/project f8132d8llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 combine-reductions.ll

[X86] Fold broadcast(truncate(extract_vector_elt(x, 0))) -> bitcast(broadcast(x)) (#206461)

Fixes regressions in #205098
DeltaFile
+16-0llvm/lib/Target/X86/X86ISelLowering.cpp
+0-1llvm/test/CodeGen/X86/combine-reductions.ll
+16-12 files

LLVM/project 1636530llvm/docs AMDGPUUsage.rst, llvm/include/llvm/BinaryFormat ELF.h

[AMDGPU] Add more generic targets (#205363)

gfx11-7-generic = 0x062
gfx13-generic = 0x063

Co-Authored-By: Claude noreply at anthropic.com
DeltaFile
+15-0llvm/docs/AMDGPUUsage.rst
+14-0llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml
+10-0llvm/lib/Target/AMDGPU/GCNProcessors.td
+10-0llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
+10-0llvm/test/tools/llvm-objdump/ELF/AMDGPU/subtarget.ll
+6-2llvm/include/llvm/BinaryFormat/ELF.h
+65-219 files not shown
+134-1225 files

LLVM/project 5c15801lldb/source/Commands CommandCompletions.cpp, lldb/test/API/functionalities/completion TestCompletion.py

[lldb] Show descriptions for settings (#206044)

This patch shows a description next to each settings completion.
DeltaFile
+18-3lldb/source/Commands/CommandCompletions.cpp
+12-0lldb/test/API/functionalities/completion/TestCompletion.py
+30-32 files

LLVM/project f1caacfllvm/lib/Target/SPIRV SPIRVBuiltins.cpp

[SPIR-V] Reuse getIConstVal instead of custom one where applicable (NFC) (#206131)
DeltaFile
+10-21llvm/lib/Target/SPIRV/SPIRVBuiltins.cpp
+10-211 files

LLVM/project e60b4b6clang/lib/AST/ByteCode Interp.cpp, clang/test/AST/ByteCode dynamic-cast.cpp

[clang][bytecode] Fix an assertion failure in dynamic_cast handling (#206447)

If `Ptr` is already a root pointer, the `getBase()` call ran into an
assertion. Fix this by moving the check to the start of the loop.
DeltaFile
+14-0clang/test/AST/ByteCode/dynamic-cast.cpp
+3-2clang/lib/AST/ByteCode/Interp.cpp
+17-22 files

LLVM/project 787619allvm/lib/Target/AMDGPU AMDGPULateCodeGenPrepare.cpp, llvm/test/CodeGen/AMDGPU issue196582-late-codegenprepare-crash-non-po2.ll

[AMDGPU] Fix bit-packing condition in LiveRegOptimizer (#201520)

This commit changes the condition for determining the eligibility for
bit-packing in LiveRegOptimizer from requiring that the scalar target
type is larger than the source type to instead require that the target
is a multiple of it.

Fixes https://github.com/llvm/llvm-project/issues/196582.

---------

Signed-off-by: Steffen Holst Larsen <sholstla at amd.com>
DeltaFile
+25-0llvm/test/CodeGen/AMDGPU/issue196582-late-codegenprepare-crash-non-po2.ll
+3-3llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
+28-32 files

LLVM/project cf1f0e9libcxx/include/__ranges adjacent_view.h, libcxx/test/libcxx/ranges/range.adaptors/range.adjacent nodiscard.verify.cpp

[libc++][ranges] Applied [[nodiscard]] to `adjacent_view` (#205206)

[[nodiscard]] should be applied to functions where discarding the return
value is most likely a correctness issue.

- https://libcxx.llvm.org/CodingGuidelines.html
- https://wg21.link/range.adjacent

Towards https://github.com/llvm/llvm-project/issues/172124

---------

Co-authored-by: A. Jiang <de34 at live.cn>
Co-authored-by: Hristo Hristov <zingam at outlook.com>
DeltaFile
+91-0libcxx/test/libcxx/ranges/range.adaptors/range.adjacent/nodiscard.verify.cpp
+19-18libcxx/include/__ranges/adjacent_view.h
+110-182 files

LLVM/project acd0f6fcompiler-rt/include/sanitizer safestack_interface.h, compiler-rt/lib/safestack safestack.cpp

[SafeStack] Allocate unsafe sigaltstack (#206463)

PR https://github.com/llvm/llvm-project/pull/196969 was approved and
merged but with `spr` the wrong base branch was set.

This merges the approved changes into `main`

---------

Co-authored-by: Paul Walker <paul.walker at arm.com>
Co-authored-by: Arseniy Obolenskiy <arseniy.obolenskiy at amd.com>
DeltaFile
+70-0compiler-rt/lib/safestack/safestack.cpp
+20-0compiler-rt/test/safestack/sigaltstack.c
+15-0compiler-rt/include/sanitizer/safestack_interface.h
+105-03 files

LLVM/project 7d409e7llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

clang-format
DeltaFile
+7-3llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+7-31 files

LLVM/project daf2e3allvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp AMDGPUHWEvents.cpp

[RFC][SIInsertWaitCnts] Remove VMemTypes

This can be considered a RFC. I'd personally like to get rid of VMEMTypes
but I don't know if anyone feels strongly that they should be kept.

My motivation for removing VMemTypes is simple: They are just a repeat
of VMEM events, just under a different name, and messier (defined as a
basic enum but actually stored as a bitmask later). It's just confusing.

This patch eliminates the need for them by:

- Adding a new entrypoint in AMDGPUHWEvents to get the basic set of
  VMEM events issued by a VMEM Instruction.
- Set BVH/SAMPLER events irrespective of whether the HW can track them.
  These events exist anyway, it should be up to InsertWaitCnt to deal with them
  properly (which is easy, only `counterOutOfOrder` needed work).
- Tracking an additional set of per-VGPR "PendingEvents" which is
  set using the "basic set of VMEM events" and cleared as needed.


    [3 lines not shown]
DeltaFile
+33-54llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+7-6llvm/lib/Target/AMDGPU/AMDGPUHWEvents.cpp
+11-0llvm/lib/Target/AMDGPU/AMDGPUHWEvents.h
+51-603 files

LLVM/project 5d24d4fllvm/test/CodeGen/AMDGPU mixed-vmem-types.ll waitcnt-bvh.mir

[AMDGPU][InsertWaitCnts] Improve testing coverage for VMemTypes (#206439)

The next patch will get rid of VMEMTypes, but some of its functionality
was not tested well. This patch adds more tests to fix gaps in testing
so that the next patch in the stack can be shown to have no impact
on codegen.

The 2 issues I had noticed was that we didn't have a test for mixed
vmem types w/ BVH, only for sampler; we also did not have a test
that was affected by `clearVmemTypes`, so removing that call
had no effect.

This patch adds tests for both cases.
DeltaFile
+241-4llvm/test/CodeGen/AMDGPU/mixed-vmem-types.ll
+8-0llvm/test/CodeGen/AMDGPU/waitcnt-bvh.mir
+249-42 files

LLVM/project f97113across-project-tests/debuginfo-tests/dexter Script.md, cross-project-tests/debuginfo-tests/dexter-tests global-constant.cpp

review comments
DeltaFile
+2-1cross-project-tests/debuginfo-tests/dexter-tests/global-constant.cpp
+2-0cross-project-tests/debuginfo-tests/dexter/Script.md
+4-12 files