LLVM/project 6bc8b3cllvm/lib/Transforms/IPO ThinLTOBitcodeWriter.cpp WholeProgramDevirt.cpp, llvm/test/ThinLTO/X86 devirt_function_alias2.ll

[CFI] Create an external linkage alias instead of promoting internals
DeltaFile
+20-33llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
+20-5llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
+10-7llvm/test/Transforms/ThinLTOBitcodeWriter/comdat.ll
+16-0llvm/lib/Transforms/IPO/LowerTypeTests.cpp
+6-4llvm/test/ThinLTO/X86/devirt_function_alias2.ll
+4-2llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll
+76-513 files not shown
+83-569 files

LLVM/project 1817d11llvm/include/llvm/IR GlobalValue.h, llvm/include/llvm/Transforms/Utils AssignGUID.h

Reland #184065
DeltaFile
+61-17llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+45-30llvm/lib/LTO/LTO.cpp
+64-2llvm/lib/IR/Globals.cpp
+49-3llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+45-5llvm/include/llvm/IR/GlobalValue.h
+49-0llvm/include/llvm/Transforms/Utils/AssignGUID.h
+313-57116 files not shown
+853-400122 files

LLVM/project fecb127clang/test/Preprocessor init-datetime-macros.c

[clang-cl][test] Use /Zs to avoid writing unnecessary output files (#204501)

#194779 adds a test clang/test/Preprocessor/init-datetime-macros.c which
verifies some diagnostics. However, it does so with `/c`, which will
unnecessarily generate an output, and when run on a build system that
does not run tests in a writeable dir by default, will cause the test to
fail.

Since we don't care about the resulting object file, use `/Zs`
(equivalent of `-fsyntax-only`) to check the diagnostics but not produce
any output files.
DeltaFile
+1-1clang/test/Preprocessor/init-datetime-macros.c
+1-11 files

LLVM/project 0f56de1offload/libomptarget omptarget.cpp, offload/plugins-nextgen/common/src RecordReplay.cpp

[offload][OpenMP] Fix record replay when no memory is used

Progams that do not use any memory (e.g., no mappings) were failing because
we were trying to execute zero size transfers.
DeltaFile
+18-12offload/libomptarget/omptarget.cpp
+26-0offload/test/tools/omp-kernel-replay/record-replay-empty-memory.cpp
+13-9offload/plugins-nextgen/common/src/RecordReplay.cpp
+2-1offload/tools/kernelreplay/llvm-omp-kernel-replay.cpp
+59-224 files

LLVM/project c02f7b1offload/libomptarget device.cpp, offload/plugins-nextgen/common/include RecordReplay.h PluginInterface.h

[offload] Improve report printing for kernel recording
DeltaFile
+35-15offload/plugins-nextgen/common/src/RecordReplay.cpp
+15-2openmp/docs/design/Runtimes.rst
+9-5offload/plugins-nextgen/common/include/RecordReplay.h
+8-2offload/libomptarget/device.cpp
+4-4offload/plugins-nextgen/common/src/PluginInterface.cpp
+3-2offload/plugins-nextgen/common/include/PluginInterface.h
+74-301 files not shown
+76-317 files

LLVM/project d3ac9b5bolt/include/bolt/Core DebugData.h DIEBuilder.h, bolt/include/bolt/Rewrite DWARFRewriter.h

[RFC][BOLT] Add a new parallel DWARF processing(2/2) (#197859)

This PR implements a new parallel DWARF debug info processing pipeline
for BOLT that significantly speeds up `--update-debug-sections` for
large binaries. It is the second part of the split from the overall RFC
changes
RFC - [[RFC][BOLT] A New Parallel DWARF Processing Approach in
BOLT](https://discourse.llvm.org/t/rfc-bolt-a-new-parallel-dwarf-processing-approach-in-bolt/90736)
(The overall changes.)

This PR does the following:
1. **Equivalence-class CU partitioning:** Replaces batchsize grouping
with union-find over DW_FORM_ref_addr references. Connected CUs share a
bucket; isolated CUs become singletons.

> For the non-LTO case, CUs have no cross-CU dependencies, so each CU is
placed into its own singleton bucket and processed fully in parallel.
> For the LTO case, CUs with cross-CU dependencies are grouped into the
same bucket and processed sequentially within that bucket, while

    [7 lines not shown]
DeltaFile
+513-202bolt/lib/Rewrite/DWARFRewriter.cpp
+50-7bolt/include/bolt/Rewrite/DWARFRewriter.h
+55-0bolt/test/X86/dwarf4-cross-cu-ranges.test
+30-15bolt/lib/Core/DebugData.cpp
+16-12bolt/include/bolt/Core/DebugData.h
+7-2bolt/include/bolt/Core/DIEBuilder.h
+671-2382 files not shown
+673-2408 files

LLVM/project bc5c332llvm/lib/Target/AMDGPU AMDGPUISelDAGToDAG.cpp, llvm/test/CodeGen/AMDGPU i128-add-carry-chain.ll

[AMDGPU] Keep i64 carry chains on VCC when feeding VALU users

This PR fixes an issue where ISel could mix scalar and vector carry chains when
lowering widened integer add/sub operations. A scalar-looking i64 carry producer
may feed a divergent carry consumer, so ISel now keeps that carry chain on VCC
to avoid invalid MIR.
DeltaFile
+65-0llvm/test/CodeGen/AMDGPU/i128-add-carry-chain.ll
+36-2llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+101-22 files

LLVM/project 258b68fllvm/lib/Target/LoongArch LoongArchISelLowering.cpp LoongArchLSXInstrInfo.td, llvm/test/CodeGen/LoongArch/lasx/ir-instruction fptoui.ll fptosi.ll

[LoongArch] Combine FP_TO_UINT/FP_TO_SINT with [X]VFTINTRZ instruction (#201569)

Combine double conversion to signed 32-bit integer with
`[X]VFTINTRZ_W_D` instructions.

There are three cases:
1. For VT smaller than i32, we promote it to i32 then truncate to the
final result.
2. For `fptoui double to i32`, we convert it to `fptosi double to i64`
then truncate, avoid doing so with LASX enabled because we already have
the corresponding pattern in TableGen.
3. Last, for `fptosi double to i32`, we'll split them into blocks
(128-bit or 256-bit depending on whether LASX is enabled or not) and
then feed them into `[X]VFINTRZ_W_D` instructions, we using the XV
version, a shuffle is need because of the data layout is per 128-bit
lane.
DeltaFile
+96-0llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+42-2llvm/test/CodeGen/LoongArch/lsx/ir-instruction/fptoui.ll
+39-2llvm/test/CodeGen/LoongArch/lsx/ir-instruction/fptosi.ll
+20-0llvm/test/CodeGen/LoongArch/lasx/ir-instruction/fptoui.ll
+17-3llvm/test/CodeGen/LoongArch/lasx/ir-instruction/fptosi.ll
+9-0llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
+223-71 files not shown
+227-77 files

LLVM/project 84ebdccoffload/libomptarget device.cpp, offload/plugins-nextgen/common/include RecordReplay.h PluginInterface.h

[offload] Improve report printing for kernel recording
DeltaFile
+35-16offload/plugins-nextgen/common/src/RecordReplay.cpp
+13-2openmp/docs/design/Runtimes.rst
+9-5offload/plugins-nextgen/common/include/RecordReplay.h
+8-2offload/libomptarget/device.cpp
+4-4offload/plugins-nextgen/common/src/PluginInterface.cpp
+3-2offload/plugins-nextgen/common/include/PluginInterface.h
+72-311 files not shown
+74-327 files

LLVM/project b2ffc0dllvm/lib/Transforms/Scalar LoopInterchange.cpp, llvm/test/Transforms/LoopInterchange inner-header-has-duplicate-succs.ll

[LoopInterchange] Reject if inner loop header has duplicate successors (#204128)

Previously, loop interchange crashed in several cases where the inner
loop header had duplicate successors. In practice, the following was
happening:

- During the transformation phase, the inner loop header was not split
because its first non-PHI instruction was its terminator.
- `updateSuccessor` was called on the header with `MustUpdateOnce=true`,
which triggers an assertion failure.

This patch fixes the issue by rejecting such cases during the legality
check phase. I believe this situation is rare, so it should not
significantly affect real-world cases.

Fix #203887.
DeltaFile
+184-0llvm/test/Transforms/LoopInterchange/inner-header-has-duplicate-succs.ll
+7-0llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+191-02 files

LLVM/project d58c356llvm/lib/Target/AMDGPU AMDGPUISelDAGToDAG.cpp SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU packed-fp64.ll packed-u64.ll

[AMDGPU] Make v2x64 BUILD_VECTOR legal on gfx1251
DeltaFile
+120-174llvm/test/CodeGen/AMDGPU/packed-fp64.ll
+70-106llvm/test/CodeGen/AMDGPU/packed-u64.ll
+14-36llvm/test/CodeGen/AMDGPU/shl.v2i64.ll
+15-16llvm/test/CodeGen/AMDGPU/pk-lshl-add-u64.ll
+11-6llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+3-2llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+233-3406 files

LLVM/project a24e158llvm/lib/Target/AMDGPU SIFoldOperands.cpp, llvm/test/CodeGen/AMDGPU fold-imm-pk64.mir

[AMDGPU] Prevent folding of immediates larger than 64 bit
DeltaFile
+37-0llvm/test/CodeGen/AMDGPU/fold-imm-pk64.mir
+3-0llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+40-02 files

LLVM/project ced82d8clang/lib/Headers spirvintrin.h nvptxintrin.h

[Clang] Make the pointers to gpuintrin AS query const (#204492)

Summary:
Right now these force a const cast if the user is checking a read-only
pointer, not great.
DeltaFile
+2-2clang/lib/Headers/spirvintrin.h
+2-2clang/lib/Headers/nvptxintrin.h
+2-2clang/lib/Headers/amdgpuintrin.h
+6-63 files

LLVM/project d6d9346llvm/include/llvm/Transforms/IPO InstrumentorRuntimeHelper.h, llvm/lib/Transforms/IPO Instrumentor.cpp

[Instrumentor] Move NumericFlags into InstrumentorRuntimeHelper.h (#204068)

This patch makes the `NumericFlags` enum visible to the end user by
moving it into `InstrumentorRuntimeHelper.h`.
DeltaFile
+14-0llvm/test/Instrumentation/Instrumentor/default_rt.h
+14-0llvm/include/llvm/Transforms/IPO/InstrumentorRuntimeHelper.h
+1-12llvm/lib/Transforms/IPO/Instrumentor.cpp
+29-123 files

LLVM/project 3a6eb67llvm/lib/Target/AArch64 AArch64SystemOperands.td AArch64InstrFormats.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

fixup! Convert PSB to use PSBHint for consistency
DeltaFile
+7-23llvm/lib/Target/AArch64/AArch64SystemOperands.td
+18-5llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+5-6llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+4-4llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.cpp
+3-2llvm/lib/Target/AArch64/AArch64InstrFormats.td
+1-1llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
+38-411 files not shown
+39-417 files

LLVM/project 0aea056llvm/lib/Target/AArch64 AArch64SystemOperands.td AArch64InstrInfo.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

fixup! Address PR comments
DeltaFile
+24-48llvm/lib/Target/AArch64/AArch64SystemOperands.td
+25-23llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+16-22llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+15-8llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.cpp
+5-13llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
+9-8llvm/lib/Target/AArch64/AArch64InstrInfo.td
+94-1224 files not shown
+108-13510 files

LLVM/project a40055cllvm/include/llvm/Support CHERICapabilityFormat.h, llvm/lib/Support CHERICapabilityFormat.cpp

[CHERI] Fix incorrect MAX_E for RV64Y capabilities. (#204487)

Add tests for all capability formats at the upper end of their ranges, which would have caught this oversight.
DeltaFile
+13-0llvm/unittests/Support/CHERICapabilityFormatTest.cpp
+2-2llvm/lib/Support/CHERICapabilityFormat.cpp
+1-1llvm/include/llvm/Support/CHERICapabilityFormat.h
+16-33 files

LLVM/project 5dc8ac2llvm/lib/Target/DirectX DXILRemoveUnusedResources.cpp DXILRemoveUnusedResources.h, llvm/test/CodeGen/DirectX unused-resources-impl-binding.ll resources-in-unused-function.ll

[DirectX] Add DXILRemoveUnusedResources pass (#200965)

Adds `DXILRemoveUnusedResources` pass that scans the module and removes
any resource that is not used. It means that it removes calls to
`dx_resource_handlefrom{implicit}binding` whose return value is either
not used at all, or it is saved to a global variable that does not have
external linkage and is not used anywhere else in the module.

This pass needs to run before implicit resource binding assignment pass.
The test `unused-resources-impl-binding.ll` makes sure the implicit
binding assignments are not affected by the unused resources.

Since we have many tests that are initializing resources without
actually using them, an internal option
`-disable-dxil-remove-unused-resource` has been added to `llc` so we can
keep these tests simple without adding extra code to artificially use
each resource.

Depends on #200312

Fixes #192524
DeltaFile
+140-0llvm/lib/Target/DirectX/DXILRemoveUnusedResources.cpp
+109-0llvm/test/CodeGen/DirectX/unused-resources-impl-binding.ll
+81-0llvm/test/CodeGen/DirectX/resources-in-unused-function.ll
+68-0llvm/test/CodeGen/DirectX/unused-resources.ll
+29-0llvm/lib/Target/DirectX/DXILRemoveUnusedResources.h
+6-0llvm/lib/Target/DirectX/DirectX.h
+433-018 files not shown
+453-1424 files

LLVM/project e4a0e1dlibc/benchmarks LibcRsqrtf16GoogleBenchmarkMain.cpp CMakeLists.txt, libc/src/__support/math rsqrtf16.h CMakeLists.txt

[libc][math][c23] Improve rsqrtf16 function for targets without fp32 FPUs. (#160639)

Closes #159378 

#### Changes
- This PR adds math approximation for targets that don't have hardware
for floats - in other words, targets that don't have
`LIBC_TARGET_CPU_HAS_FPU_FLOAT`
- This PR also introduces Google Benchmark for rsqrtf16
- Fixed typo in `+inf` case. Should return +0 according to
[F.10.4.9](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf)
DeltaFile
+241-5libc/src/__support/math/rsqrtf16.h
+105-0libc/benchmarks/LibcRsqrtf16GoogleBenchmarkMain.cpp
+21-0libc/benchmarks/CMakeLists.txt
+6-0libc/src/__support/math/CMakeLists.txt
+1-1utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+374-65 files

LLVM/project 14d163elldb/source/Plugins/ObjectFile/wasm ObjectFileWasm.cpp, lldb/source/Plugins/Platform/WebAssembly PlatformWasm.cpp

[lldb] Report a generic wasm32 architecture for Wasm object files (#204496)

ObjectFileWasm hardcoded the architecture of every Wasm module as
"wasm32-unknown-unknown-wasm". A Wasm binary does not actually encode a
vendor or OS, those are properties of the runtime executing it.

When debugging via a runtime whose gdb stub reports a more specific
triple (e.g. WAMR reports "wasm32-wamr-wasi-wasm"), lldb adopts that
triple and clears the module list. The dynamic loader then tries to
reload the main executable, but GetOrCreateModule rejects the on-disk
file because the triples are incompatible. This causes lldb to back to
reading from memory.

Fix all this by reporting a bare "wasm32"/"wasm64" architecture instead.
DeltaFile
+3-3lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
+1-2lldb/source/Plugins/Platform/WebAssembly/PlatformWasm.cpp
+1-1lldb/test/Shell/ObjectFile/wasm/embedded-debug-sections.yaml
+1-1lldb/test/Shell/ObjectFile/wasm/unified-debug-sections.yaml
+1-1lldb/test/Shell/ObjectFile/wasm/basic.yaml
+1-1lldb/test/Shell/ObjectFile/wasm/stripped-debug-sections.yaml
+8-96 files

LLVM/project bcca9afllvm/lib/CodeGen AtomicExpandPass.cpp, llvm/test/CodeGen/ARM atomic-load-store.ll

[AtomicExpand] Add bitcasts when expanding store atomic vector

AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
DeltaFile
+99-6llvm/test/CodeGen/X86/atomic-load-store.ll
+98-0llvm/test/Transforms/AtomicExpand/X86/expand-atomic-non-integer.ll
+49-0llvm/test/CodeGen/ARM/atomic-load-store.ll
+4-2llvm/lib/CodeGen/AtomicExpandPass.cpp
+250-84 files

LLVM/project 8f2ef03llvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp, llvm/test/CodeGen/X86 atomic-load-store.ll

[SelectionDAG] Keep split vector atomic store value in a vector register

When the value of an ATOMIC_STORE has a vector type whose legalization
action is split (e.g. <4 x half>/<4 x bfloat> on X86 without F16C),
SplitVecOp_ATOMIC_STORE bitcast the value straight to a scalar integer
spanning the memory width. For a split vector that bitcast is expanded
element by element, reassembling the value in GPRs (a long pextrw/shl/or
sequence) before the store.

Instead, keep the value in a vector register when a legal vector form
exists: reinterpret it as a same-shaped integer-element vector (an FP
element type may have no legal vector form, e.g. bfloat on SSE2, while
the integer-of-element-size form does), widen that to a legal vector,
and extract the low integer element of the memory width. This issues the
store directly from a vector register (a single MOVQ/MOVD on X86),
matching the widen-path codegen already produced on AVX targets. Falls
back to the scalar bitcast when no suitable legal vector type exists.
DeltaFile
+203-329llvm/test/CodeGen/X86/atomic-load-store.ll
+33-6llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+236-3352 files

LLVM/project 8b0361bmlir/lib/Dialect/XeGPU/IR XeGPUDialect.cpp, mlir/test/Dialect/XeGPU sg-to-lane-distribute-unit.mlir

[MLIR][XeGPU] Treat lane_data repacks as compatible layouts (#204016)

A subgroup-level convert_layout that only repacks lane_data while keeping
lane_layout unchanged (e.g. [N, 1] to [1, 1] with order = [1, 0]) is a no-op
after lane distribution: each lane owns the same elements in the same order.
Previously isCompatibleWith compared per-distribution-unit block starts, which
encode the lane_data blocking, so such layouts looked incompatible.

Handle this at the Lane level in isCompatibleWith by expanding the block
starts into per-element coordinates before comparing. The expansion only runs
when lane_data differ; otherwise the cheaper block-start comparison is exact.
The shared logic lives in a compareDistributedCoords helper used by both
LayoutAttr and SliceAttr. The Subgroup level is left for a follow-up (TODO).

Add a lit test covering the fold in sg-to-lane-distribute-unit.mlir.
DeltaFile
+59-14mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp
+55-0mlir/test/Dialect/XeGPU/sg-to-lane-distribute-unit.mlir
+114-142 files

LLVM/project dd069b6lldb/source/Symbol Function.cpp, lldb/test/Shell/SymbolFile/DWARF/x86 prologue-entry-not-covered.s

[lldb] Skip the prologue when a function's entry has no line row (#204480)

Function::GetPrologueByteSize computed the prologue only when a line
table row contained the function's entry address (low_pc). When no row
covers low_pc it returned 0, leaving a name breakpoint sitting on the
function's entry address. For WebAssembly the entry address is the
function's locals-declaration byte rather than an instruction, so the
line table has no row there and the breakpoint is never hit.

When low_pc has no covering row, fall back to the first line row that
begins within the function's range and run the existing prologue logic
on it. For functions whose entry is already covered (all normally
compiled native code) this branch is not taken, so behavior is remains
unchanged.

This PR adds a hand (Claude) crafted regression test with a function
whose entry address is not covered by a line row.
DeltaFile
+101-0lldb/test/Shell/SymbolFile/DWARF/x86/prologue-entry-not-covered.s
+33-2lldb/source/Symbol/Function.cpp
+134-22 files

LLVM/project 0c6222dllvm/lib/Target/AArch64 AArch64SystemOperands.td AArch64InstrFormats.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

fixup! Address PR comments
DeltaFile
+22-50llvm/lib/Target/AArch64/AArch64SystemOperands.td
+21-23llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+14-20llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+15-8llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.cpp
+5-13llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
+6-5llvm/lib/Target/AArch64/AArch64InstrFormats.td
+83-1192 files not shown
+89-1228 files

LLVM/project aedf92cllvm/lib/Target/AMDGPU AMDGPU.td GCNHazardRecognizer.cpp, llvm/test/CodeGen/AMDGPU trans-coexecution-hazard.mir

[AMDGPU] Introduce TransCoexecutionHazard target feature (#204412)

  TransCoexecutionHazard implies there is data hazard between TRANS and
the following VALU instruction when they are co-executed. Currently
gfx1250 and gfx1251 have this target feature.
DeltaFile
+44-19llvm/test/CodeGen/AMDGPU/trans-coexecution-hazard.mir
+12-3llvm/lib/Target/AMDGPU/AMDGPU.td
+1-1llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+57-233 files

LLVM/project 0a92811utils/bazel/llvm-project-overlay/clang BUILD.bazel

[Bazel] Fixes 53dabae (#204494)

This fixes 53dabae40fb3a85148f1bb72e885e32081482dbe.

Co-authored-by: Google Bazel Bot <google-bazel-bot at google.com>
DeltaFile
+20-0utils/bazel/llvm-project-overlay/clang/BUILD.bazel
+20-01 files

LLVM/project cc60ab0llvm/test/MC/AArch64 armv9.7a-memsys.s basic-a64-instructions.s

fixup! Add testcases for all missing HINTs
DeltaFile
+13-1llvm/test/MC/AArch64/armv9.7a-memsys.s
+12-0llvm/test/MC/AArch64/basic-a64-instructions.s
+9-1llvm/test/MC/AArch64/armv9.6a-pcdphint.s
+8-0llvm/test/MC/AArch64/armv8.4a-trace.s
+6-0llvm/test/MC/AArch64/armv9.5a-pauthlr.s
+3-0llvm/test/MC/AArch64/armv8.2a-statistical-profiling.s
+51-22 files not shown
+56-28 files

LLVM/project 484fc36llvm/lib/Target/AArch64 AArch64InstrFormats.td AArch64InstrInfo.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

[AArch64][llvm] Some instructions should be `HINT` aliases (NFC)

Implement the following instructions as a `HINT` alias instead of a
dedicated instruction in separate classes:
  * `stshh`
  * `stcph`
  * `shuh`
  * `tsb`

Updated all their helper methods too, and updated the `stshh` pseudo
expansion for the intrinsic to emit `HINT #0x30 | policy`.

Code in AArch64AsmPrinter::emitInstruction identified an initial BTI using a
broad bitmask on the HINT immediate, which also matched shuh/stcph (50..52)
This could move the patchable entry label after a non-BTI instruction.
Replaced it with an exact BTI check using the BTI HINT range (32..63) and
AArch64BTIHint::lookupBTIByEncoding(Imm ^ 32).

A following change will remove duplicated code and simplify.

    [2 lines not shown]
DeltaFile
+86-0llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+41-40llvm/lib/Target/AArch64/AArch64InstrFormats.td
+22-3llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
+5-14llvm/lib/Target/AArch64/AArch64InstrInfo.td
+5-10llvm/lib/Target/AArch64/AArch64SystemOperands.td
+4-2llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+163-692 files not shown
+170-708 files

LLVM/project 5796aa9clang-tools-extra/docs ReleaseNotes.rst, clang-tools-extra/test/clang-tidy/checkers/bugprone branch-clone-inline-asm.cpp

[clang][AST] Fix StmtProfile handling of GCCAsmStmt asm strings and clobbers (#201481)

`VisitGCCAsmStmt` did not profile asm strings and clobbers because they
are not child statements.
As a result, different inline asm statements could produce the same
profile.
This fixes a false positive in `bugprone-branch-clone` where branches
containing inline asm were incorrectly reported as identical.

I used AI assistance when writing the test code, but I personally
reviewed it. 🤖

Fixes https://github.com/llvm/llvm-project/issues/198616
DeltaFile
+67-0clang/test/Modules/asm-stmt-odr.cppm
+43-0clang-tools-extra/test/clang-tidy/checkers/bugprone/branch-clone-inline-asm.cpp
+5-0clang-tools-extra/docs/ReleaseNotes.rst
+2-2clang/lib/AST/StmtProfile.cpp
+1-0clang/docs/ReleaseNotes.rst
+118-25 files