clang/AMDGPU: Stop adding -m32/-m64 for OpenCL (#199005)
The pointer size is not configurable; you get what you get
based on the triple. I don't know what the point of this was,
I don't even see the argument in the final cc1 invocation.
[CostModel] Move the arbitrary load latency into getMemoryOpCost (#198790)
Currently TargetTransformInfoImpl returns an arbitrary cost of 4 for the
latency of loads in getInstructionCost. This means even if a target
correctly reports the latency for loads in getMemoryOpCost we still get
this arbitrary cost in getInstructionCost. It also means the latency
cost is inconsistent depending on whether you go through
getInstructionCost or getMemoryOpCost.
Solve this by moving the current arbitrary cost into getMemoryOpCost.
This has the side-effect of affecting the cost of masked loads if they
aren't handled by the target, as in BasicTTIImpl the cost for these is
calculated using getMemoryOpCost. This should mean the cost is more
accurate though, and likely won't have any effect as in any
transformation that could introduce masked loads (e.g. vectorization)
the current cost is probably high enough that it's already not worth
using.
[AArch64] Delete llvm/test/CodeGen/AArch64/GlobalISel/postselectopt-dead-cc-defs-in-fcmp.mir (NFC) (#198974)
It's bit-identical to
llvm/test/CodeGen/AArch64/GlobalISel/postselectopt-dead-cc-defs.mir. The
-in-fcmp test is older (0f0fd383b487), but 84a6a057e60b later expanded
op coverage and left both files with the exact same contents.
[AArch64] ConditionOptimizer: replace per-block DenseMap with ScopedHashTable traversal (#196746)
The intra-block path used a DenseMap cleared at each block boundary, so
pairs from dominating blocks were never visible to descendants. Replace
it and the separate cross-block path with a unified recursive domtree
walk using a ScopedHashTable. Any dominating block's pair is now a
candidate, not just pairs within the same block.
Rename optimizeIntraBlock to optimizeBlock and remove dead code
Set ACL restricting container root fs
libvirt has procedure whereby it changes to root UID of container
(non-zero) before executing pivot_root to its new filesystem.
This commit ensures that UID 0 of our container idmap ranges has
execute on the root directory (but no one else does). This allows
pivot_root to succeed, but prevents access by non-privileged users.
update to got 0.126
- remove dependency on xxd from the test suite (dep was added in got-0.125)
- really add all relevant parent commits to newly created pack files
- fix serving a branch of entirely unrelated history with gotd
- fix test failures when tests are run with Git 2.54
- initial sha256 support in the network protocol for got clone/fetch/send
risc-v: fix ref/mod emulation PTE handling.
The previous code has zero chance of working and now that pmap_test_mod_ref
exists it can prove this code is correct.
[AMDGPU][Clang] use ScopeModel to translate integer scopes [NFC] (#198250)
The assumption here is that AMDGPU builtins (typically suffixed with
`__builtin_amdgcn`) use the `__MEMORY_SCOPE_*` enumeration, and not the
`__HIP_MEMORY_SCOPE_*` enumeration (which is how it should be).
Assisted-By: Claude Opus 4.6
[LoopInterchange] Add test for multiple accesses to same base ptr (NFC) (#193476)
Currently `getInstrOrderCost` doesn't check the base pointers of the
accesses, which can lead to undesirable profitability decisions. This
patch adds a test that demonstrates such a case.
clang/AMDGPU: Stop adding -m32/-m64 for OpenCL
The pointer size is not configurable; you get what you get
based on the triple. I don't know what the point of this was,
I don't even see the argument in the final cc1 invocation.
[GlobalISel][KnownBits] Port SREM to GlobalISel (#198956)
This PR also move case statement for or `G_UREM `that is being
introduced by https://github.com/llvm/llvm-project/pull/193455 So that
`G_[U|S][DIV|REM] ` being grouped together, just like in
`SelectionDAG.cpp`
Related: https://github.com/llvm/llvm-project/issues/150515
---------
Signed-off-by: ZakyHermawan <zaky.hermawan9615 at gmail.com>
[lldb] Add missing includes. (#198996)
Failed to build in CI because,
ScriptedPythonInterface::CreatePlugingObject is a template function and
the arguments are of incomplete types gotten from `lldb-forward.h`
(typedef of lldb_private::XXXX = XXXXSP).
Introduced in commit 1b4a578a9f7760a00bf26525a603be1ec6e7d862
[AMDGPU] Coverity fixes - check ret val and init class members (#198570)
Coverity fixes:
* calling getIntrinsicSignature without checking return value (as is
done elsewhere 4 out of 5 times) in
llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
* non-static class member MaxSGPRs, MaxVGPRs and MaxUnifiedVGPRs is not
initialized in this constructor nor in any functions that it calls in
llvm/lib/Target/AMDGPU/GCNRegPressure.h