[AMDGPU][true16] extract 16bit for scratch_load_ubyte_st when spilling (#203589)
In sramecc mode scratch_load_ubyte_st is selected for 16bit spilling.
Need a tmp vgpr32 and extract lo16 from it
audio/libresidfp: New port: Software emulation of MOS6581/8580 SID chip
Fork of Dag Lem's reSID 0.16 which is a reverse engineered software
emulation meant to replicate the SID as faithfully as possible while
keeping good performance for realtime use
Merge branch 'main' of github.com:llvm/llvm-project into users/ziqingluo/PR-179173940
Conflicts:
clang/unittests/ScalableStaticAnalysisFramework/Analyses/PointerFlow/PointerFlowTest.cpp
[AArch64][PAuth] Fix return-address auth for swifttailcc with FPDiff > 0 (#203340)
When a swifttailcc tail call has FPDiff > 0 (the caller received more
stack argument space than the callee pops), the epilogue contains an SP
adjustment to discard the leftover argument space. The existing code
treated both FPDiff < 0 and FPDiff > 0 uniformly in a single 'FPDiff !=
0' block, using AUTI[AB]1716 with a reconstructed entry-SP in x16 for
both cases.
For FPDiff < 0 (callee pops more) that reconstruction is necessary and
correct. For FPDiff > 0 it is wrong: by the time we enter the block the
post-index LDP has already adjusted SP back to the frame base, but the
'add sp, sp, #N' argument pop has not yet run. Entry SP equals the
current SP at that point, so AUTI[AB]SP would work directly, but instead
the combined block bumped SP via StackOffset::getFixed(-FPDiff) which
overshoots, and then emits AUTIA1716 with a wrong discriminator. Worse
yet, the SP restore had already been emitted *before* the auth, leaving
the live argument stack below SP and outside the red-zone during the
authentication window.
[9 lines not shown]
[SSAF][PointerFlow] Recognize reference-to-pointer/array Decls
Decls of reference-to-pointer/array types are now treated the same as
those of pointer/array type.
rdar://179173940
Merge tag 'pci-v7.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fix from Bjorn Helgaas:
- Add Frank Li as PCI endpoint reviewer (Frank Li)
* tag 'pci-v7.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: Add Frank Li as PCI endpoint reviewer
[BOLT] Fix perf data return identification (#203628)
If perf data doesn't have branch type recorded, missing value would
incorrectly be interpreted as not-a-return. Only populate Returns map if
the branch type is available.
Fixes bug introduced in #202813.
[LoopInterchange] Mark getAddRecCoefficient with static (#203624)
As this function is a file-scope non-member function, it's better to
mark it with static.
[LoopInterchange] Fix crash when followLCSSA returns constant (#203515)
Similar as the case in ##201069, `followLCSSA` may return a constant
value, but it was cast to Instruction unconditionally. We need to
explicitly check whether the returned value is an Instruction or not.
Fix #203375.
[SLP] Vectorize full insertvalue buildvector sequences
Treat a complete chain of insertvalue instructions building a homogeneous
literal struct from scalars as a buildvector, like insertelement sequences.
The scalars are vectorized into one vector; the aggregate is rebuilt from it
via a stack store + load, or stored directly when its only user is a store.
insertvalue is routed through the existing insertelement buildvector paths
(type/index helpers, reordering, tree build, cost model, min-bitwidth, and
codegen). Only single-index, non-vector inserts building from an undef
aggregate are handled.
Fixes #43353
Reviewers: hiraditya, bababuck
Pull Request: https://github.com/llvm/llvm-project/pull/200274
[MLIR][XeVM] Add xevm.extf op as the inverse of xevm.truncf (#203124)
Add a new xevm.extf operation that extends f8/bf8/f4 values to f16/bf16,
mirroring the existing xevm.truncf op, together with its lowering in
XeVMToLLVM.
Lowering details (XeVMToLLVM):
- bf8/f8 -> f16 via __builtin_IB_bf8tohf_16 / __builtin_IB_hf8tohf_16.
- bf8/f8 -> bf16 via f16 -> f32 (convert_float16) -> bf16
(__builtin_IB_ftobf_16).
- e2m1 (fp4) -> f16/bf16 via __builtin_IB_shfl_idx4_lut and
__builtin_IB_shfl_idx4_to_fp16_8_packed (LUT 7 for f16, 5 for bf16).
Adds the op definition and verifier, conversion/roundtrip/invalid unit
tests, and f8 and fp4 GPU round-trip integration tests.
Adds arith.extf to xevm.extf lowering and arith.truncf to xevm.truncf
lowering in XeGPU to XeVM conversion and unit tests.
NAS-141383 / 27.0.0-BETA.1 / Fix a few API tests (#19133)
* Reporting realtime shows stats on boot pool and so we should expect
it.
* pam / auth stack now properly reports in *audit* messages why the
authentication failed (minimally including PAM error code).
* harden our webshell tests
(cherry picked from commit 7546c612061dc7ccbca9751a1d5103f943de033d)
MAINTAINERS: Add Frank Li as PCI endpoint reviewer
I have volunteered to review PCI endpoint-related changes. Add myself as a
reviewer to be notified when related patches are posted.
Signed-off-by: Frank Li <Frank.Li at nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas at google.com>
Reviewed-by: Krzysztof Wilczyński <kwilczynski at kernel.org>
Link: https://patch.msgid.link/20260611210007.529205-1-Frank.Li@oss.nxp.com
[AMDGPU] NFC: Drop constexpr from getFlavor*Name functions (#203603)
It seems specifying these as constexpr was causing some buildbot
failures due to llvm_unreachable --
```
[1/123] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCoExecSchedStrategy.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCoExecSchedStrategy.cpp.o
/usr/bin/c++ -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GLIBCXX_USE_CXX11_ABI=1 -D_GNU_SOURCE -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_EXTENSIVE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/path/to/build.AArch64.Release.main/lib/Target/AMDGPU -I/path/to/llvm-project/llvm/lib/Target/AMDGPU -I/path/to/build.AArch64.Release.main/include -I/path/to/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-array-bounds -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden -UNDEBUG -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCoExecSchedStrategy.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCoExecSchedStrategy.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUCoExecSchedStrategy.cpp.o -c /path/to/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
In file included from /path/to/llvm-project/llvm/include/llvm/ADT/Hashing.h:49,
from /path/to/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:12,
from /path/to/llvm-project/llvm/include/llvm/CodeGen/GlobalISel/CallLowering.h:17,
from /path/to/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.h:17,
from /path/to/llvm-project/llvm/lib/Target/AMDGPU/GCNSubtarget.h:17,
from /path/to/llvm-project/llvm/lib/Target/AMDGPU/GCNRegPressure.h:20,
from /path/to/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h:16,
from /path/to/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h:17,
from /path/to/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp:14:
/path/to/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h: In function 'constexpr llvm::StringRef llvm::AMDGPU::getFlavorName(llvm::AMDGPU::InstructionFlavor)':
[56 lines not shown]