[NFC][GlobalISel] Refactor ownership of InstructionMatchers (#200798)
- Clarify that the array of InstructionMatchers in the RuleMatcher are
for the roots only.
- Let RuleMatcher own all of the InstructionMatcher used for/by
predicates.
They are all kept in an array in which the index of the
InstructionMatcher is equal to its
InsnID, which eliminates some redundant tracking.
- Remove duplicate tracking of InsnID from RuleMatcher;
InstructionMatcher does it on its own already.
Co-authored-by: Pierre-vh <29600849+Pierre-vh at users.noreply.github.com>
[GlobalISel] Do not depend on the RuleMatcher at MatchTable emission (#200799)
Some PredicateMatchers/MatchAction/OperandRenderers relied on accessing
RuleMatcher at emission as a crutch.
Instead, make these classes collect all necessary information in the
constructor so the `emit` methods don't depend on RuleMatcher anymore.
The primary motivation for this is that I've been looking at ways to optimize the MatchTable better,
and the fact that Predicates/Actions/Renderers are not "pure" objects, in the sense that they keep
accessing a bunch of data all over the place even as late as emission, was a consistent pain.
This is NFCI. There are no changes to any of the match table for AMDGPU/AArch64 in this patch.
This patch has a bunch of noise due to function signature changes so I'll highlight the following interesting changes:
- `SameOperandMatcher` needed a bit of an update in its `canHoistOutsideOf` function. I had to rewrite it
but I think the end result is the same.
- `EraseInstAction` has been updated as well, and its users in both Combiner/ISel backends have been updated to.
Instead of ignoring this action if the Inst was already erased, it's now the responsibility of the
builder to never insert it in the first place. `BuildMIAction` had a small update because of that too.
[4 lines not shown]
[NFC][GlobalISel] Refactor ownership of InstructionMatchers (#200798)
- Clarify that the array of InstructionMatchers in the RuleMatcher are for the roots only.
- Let RuleMatcher own all of the InstructionMatcher used for/by predicates.
They are all kept in an array in which the index of the InstructionMatcher is equal to its
InsnID, which eliminates some redundant tracking.
- Remove duplicate tracking of InsnID from RuleMatcher; InstructionMatcher does it on its own already.
[AMDGPU][SIMemoryLegalizer] Consider scratch operations as NV=1 if GAS is disabled
- Clarify that `thread-private` MMO flag is still useful.
- If GAS is not enabled (which is the default as of last patch), consider an op as `NV=1` if it's a `scratch_` opcode, or if the MMO is in the private AS.
- Add tests for the new cases.
- Update AMDGPUUsage GFX12.5 memory model
[AMDGPU] Make globally-addressable-scratch opt-in
This feature is meant to be opt-in for more advanced users, not default-enabled.
It may reduce performance otherwise as we can't assume private AS is thread-local
when it is enabled.
- Add `HasGloballyAddressableScratchSupport` feature to check if a target's scratch
addressing is changed due to support for globally addressable scratch.
- Use `EnableGloballyAddressableScratch` to check whether the user opted into
globally addressable scratch. This affects whether to lower scratch atomics as flat,
and in the future will affect whether NV=1 can be set on scratch accesses.
[CaptureTracking] Volatile operations only capture address (#201316)
The fact that a volatile access was performed on a certain address is an
observable effect in the abstract machine, so volatile operations
capture the address of the accessed pointer. However, they do not
capture the provenance. This matches what we document in LangRef.
While I'm pretty sure that this models the semantics correctly, I'm
slightly concerned that we might be using the provenance capture here to
paper over some other issue, though nothing specific comes to mind (and
the test changes don't show anything problematic).
[Loads] Use willNotFreeBetween() for dereferenceable-at-point (#201353)
If dereferenceable-at-point semantics are enabled, use
willNotFreeBetween() to check whether frees are known to not occur
between the definition point of the value and the context instruction.
We already use this logic for dereferenceable assumptions, this enables
it for other dereferenceability fact (under deref-at-point).
[lldb] Use MemoryCache in Process::ReadRangesFromMemory (#201166)
There are scenarios (especially in the ObjectiveC metadata reading) in
which multiple strings are read over and over again, but through
different code paths. In order to make that part of the code use
MultiMemRead effectively, the memory cache must be integrated into
ReadRangesFromMemory before we can migrate the string reading to
vectorized version.
rss: add sysctl enable toggle
This commit also includes the original refactoring changes
This change allows the kernel to operate with the default netisr cpu-affinity settings while having RSS compiled in. Normally, RSS changes quite a bit of the behaviour of the kernel dispatch service - this change allows for reducing impact on incompatible hardware while preserving the option to boost throughput speeds based on packet flow CPU affinity.
Make sure to compile the following options in the kernel:
options RSS
As well as setting the following sysctls:
net.inet.rss.enabled: 1
net.isr.bindthreads: 1
net.isr.maxthreads: -1 (automatically sets it to the number of CPUs)
And optionally (to force a 1:1 mapping between CPUs and buckets):
net.inet.rss.bits: 3 (for 8 CPUs)
[5 lines not shown]
[Flang][Semantics] Throw diagnostics for identical entry and result names. (#198500)
Fixes https://github.com/llvm/llvm-project/issues/198499
In flang entry name and result names cannot be same as per
[C1583](https://j3-fortran.org/doc/year/25/25-007r1.pdf).
Previously, Flang did not diagnose cases where the ENTRY name and RESULT
name were identical, e.g.
```
function m1f1()
integer :: m1f1
real :: m1f1e1
m1f1 = 0
entry m1f1e1() result(m1f1e1)
m1f1e1 = 0.1
end function
```
[Flang] Fix -frelaxed-c-loc-checks being ignored when using the driver (#200733)
`-frelaxed-c-loc-checks` worked correctly when passed directly to -fc1,
but was silently ignored when using the driver (e.g., flang -c
-frelaxed-c-loc-checks), causing the flag to go unused. This patch fixes
it by adding `OPT_relaxed_c_loc` to the `addAllArgs` call in Flang.cpp
Also extend the existing test with a driver-mode RUN line to cover this
path.
[mlir][spirv] Add TOSA graph constant marking (#201095)
Add a TOSA to SPIR-V TOSA preprocessing pass that marks large tosa.const
and tosa.const_shape operations for lowering to spirv.ARM.GraphConstant.
Keep small constants inline as spirv.Constant, assign graph constant IDs
with a grapharm-prefixed marker attribute, and teach the existing
constant conversion to use the marker when present.
Expose the grapharm source-side attribute names used for interface ABI
annotations and graph constant IDs.
Add tests for marking large constants, leaving small constants unmarked,
increasing graph constant IDs across mixed constants, and lowering
pre-marked constants to spirv.ARM.GraphConstant.
Signed-off-by: Davide Grohmann <davide.grohmann at arm.com>
[AArch64] Fix DUP-of-extload combine to ignore chain uses (#201351)
The original combine bailed when the load had more than one use, but
counted chain uses too.
autogen: unbreak build with llvm22, ok jca
The {,sig}setjmp() detection was broken. They want a sigjmp_buf, not a
sigjmp_buf *, so change from &bf to bf twice to avoid a configure time
error due to a -Wincompatible-pointer-types error.
As naddy points out, this port could be only one decade outdated rather
than almost two. I may deal with this when I find myself very bored.
[orc-rt] Make NativeDylibManager::lookup return optional addresses. (#201519)
NativeDylibManager::lookup used to return (asynchronously) a vector of
void *s where null represented not-present. This commit updates it to
return a vector of std::optional<void *>s where std::nullopt represents
not-present and an address of zero indicates that the symbol is present
with an address of zero.
This matches the resolve semantics of SimpleExecutorDylibManager,
completing the alignment of the two implementations after the earlier
additions of the Mode argument to load() and the
required/weakly-referenced flag on lookup symbols.