[SCEV] Canonicalise round-up idiom when some bits known (#197126)
Since #174380, instcombine can clear some set bits in the added constant
in expressions like this, when A has some known-clear low order bits.
(A + 15) & ~15
This transformation is valid, but can make it harder for later passes to
recognise this idiom for rounding up to a power of 2. This is causing
the ARM MVE tail predication pass to fail on loops with a trip count
which is a multiple of a small power of 2.
The fix is to reverse the transformation when building SCEV expressions,
canonicalising to always use the largest valid value for the added
constant.
Alive proofs:
https://alive2.llvm.org/ce/z/hhndoWhttps://alive2.llvm.org/ce/z/_JYVat
[AArch64][llvm] Deprecate FEAT_MPAMv2_VID
`FEAT_MPAMv2_VID` instructions and system registers, as introduced
in change d30f18d2c, are being deprecated at this time, as they've been
removed from the latest Arm ARM, which doesn't preclude them returning
in some form in future.
Other system registers introduced with `FEAT_MPAMv2` are unaffected,
and these continue to be ungated. `+mpamv2` gating is now renamed to
`+mpamv2-deprecated`, to avoid an ABI break. This makes it obvious that
it shouldn't be used.
Emit newline after IR-dump banner in PrintCallGraphPass (#199410)
Required for Compiler Explorer's opt-pipeline viewer: the tool parses
pass output by splitting on the IR-dump banner line, so the banner must
end with a newline. Without it, targets that exercise this pass cannot
be inspected through the opt-pipeline feature.
Assisted by Claude.
[flang][OpenMP] Event handles are not predetermined shared (#200055)
An event-handle variable that appears in a DETACH has its data-sharing
attributes determined according to the usual rules in the constructs
enclosing the clause.
[AArch64] Do not generate indexed addressing mode for volatile accesses (#196305)
Instructions performing register writeback do not set a valid
instruction syndrome, making it impossible to handle MMIO in protected
hypervisors. Suppress the use of postinc/preinc addressing modes for
volatile accesses, which may be used to interact with MMIO.
There are three different places that can form indexed addressing modes:
* GISel via isIndexingLegal()
* SDAG via getPreIndexedAddressParts() and getPostIndexedAddressParts()
* AArch64LoadStoreOptimizer
The the latter case, exclude volatile accesses on SP (which are relevant
for stack probing) and MTE tag stores, as both cannot be MMIO.
Fixes https://github.com/llvm/llvm-project/issues/173014.
[llubi] Add basic support for provenance modeling (#185977)
There are four solutions to model the provenance in the memory:
1. `(allocid, bitindex)` for each bit: It follows the definition of byte
type.
2. `(allocid, bitindex)` for each byte: This assumes the pointer/byte
types are always byte-sized, and requires bitextract/bitinsert to shift
by multiples of 8, as posted in
https://discourse.llvm.org/t/rfc-add-a-new-byte-type-to-llvm-ir/89522/53.
I believe this is true in most real-world cases.
3. Assign a random tag for each memory object: The tag has the same
width as the address. It is stored in the memory like addresses. Thus,
each logical byte only occupies 4 bytes. When loading a pointer, the tag
is loaded and used to recover the provenance. Incorrect bit ordering
will result in nullary provenance (with a negligible rate of false
negatives). I think it is feasible because we can always turn a false
negative into a positive with a different seed. It is also compatible
with captured components
(https://github.com/dtcxzyw/llvm-ub-aware-interpreter/blob/d15dfef5bc0c1b30b05512bbc28fddb2b50cc0b1/ubi.h#L187)
[8 lines not shown]
[GitHub] Add InstCombine Contributor Guide to new contributor greeting comment (#199730)
I have always manually replied to new contributors, reminding them to
follow the InstCombine contributor guide. Let’s automate this process.
Now it will append the link to the guide when the PR changes the
InstCombine (and highly related components) files.
[CIR] Implement 'coroutine' exception handling lowering (#200045)
This patch implements the lowering to CIR for exception handling.
Unfortunately the missing components of Flatten-CFG don't work here, so
we only test that we get successfully to CIR, not to LLVM-IR.
This patch runs the 'await-resume' in a try/catch, and only if that
succeeds, does it run the coroutine body (also in a try/catch if there
is an exception handler).
This is nearly identical to the implementation in classic-codegen,
except we invert the resume-eh variable's value, so we can just use a
simple `if` op for the branch.
[libc] Move fixed buffer GPU test to an integration test (#200042)
Move the `fixedbuffer` GPU test to an integration test.
libc tests are intended to be GTest style tests written with the normal
`TEST(Suite, Test)` GTest macros. Example
[here](https://github.com/llvm/llvm-project/blob/main/libc/test/include/SignbitTest.h#L32).
This test has its own `main` which ends up causing a `main multiple
definitions` linker error when compiling for SPIR-V (work in progress).
I'm not sure why this error doesn't occur for AMDGPU, probably the fact
we have to compile with a ton less compile/linker flags for SPIR-V and
one of them hides the issue.
Specifically the fix is that we don't link against
`libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/LibcTestMain.cpp.o`
which has its own main which conflicts with the one defined in the test.
All other tests in this directory are integration tests too.
[4 lines not shown]
llvm: Fix most LLVM_ABI annotations in Analysis (#199019)
This updates most LLVM_ABI annotations in the Analysis headers to match
expected usage:
* All public APIs should be properly annotated.
* Inlined functions should not be annotated.
These changes were done by a script fixing annotations on LLVM public
headers and manually checked.
This effort is tracked in #109483.
[InstCombine] Drop the correct assume when working on assume bundles (#198404)
Currently, all assumes of the same kind in an assume bundle are dropped,
even though only a single one is actually checked to be redundant and
should be dropped. This introduces a new `removeOperandFromBundleAt`,
which instead drops a bundle at a specific position. This should also be
faster, since copying the bundles can now be done into an already
correctly allocated vector.
[LoongArch] Add `-fstack-clash-protection` support (#195595)
This PR adds stack probing and `-fstack-clash-protection` support to the
LoongArch backend and Clang driver.
The implementation is largely borrowed from the RISCV backend (cf.
#117612, #139731), with the same allocation-unrolling strategy for
const-sized allocations.
[flang][OpenMP] Lower target in_reduction for host fallback
Teach Flang lowering and MLIR OpenMP translation to carry
in_reduction through omp.target for the host-fallback path.
The translation looks up task reduction-private storage with
__kmpc_task_reduction_get_th_data and binds the target region's
in_reduction block argument to that private pointer, so uses inside the
region do not keep referring to the original variable.
The patch also preserves in_reduction operands in the TargetOp builder
path and ensures target in_reduction list items are mapped into the
target region when needed.
The device/offload-entry path remains diagnosed as not yet implemented.
[flang] Fix -E -dM macro dumping for stdin and .f90 inputs (#200144)
Issue:
flang -E -dM does not consistently print predefined macros for stdin and
.f90 inputs, unlike expected behavior.
Root cause:
Flang only initialized predefined macros when preprocessing was implied
by -cpp or suffix-based inference (mustBePreprocessed), but not when -dM
alone requested macro dumping.
Fix:
Treat -dM as an explicit trigger to initialize macro predefinitions in,
and add a stdin regression test for flang -E -dM - < /dev/null.
Fixes #198234
[SelectionDAG] Remove redundant asserts in WidenVecRes_ATOMIC_LOAD
These asserts duplicate guarantees already provided elsewhere:
- isVector() checks are redundant because findMemType() calls
WidenVT.getVectorElementType() and WidenVT.isScalableVector()
internally, and WidenVecRes_ATOMIC_LOAD is only reached from the
ATOMIC_LOAD case in WidenVectorResult, which is the vector path.
- The element-type and scalability consistency between LdVT and
WidenVT is a property of GetWidenedVector / getTypeToTransformTo.
llvm: Fix most LLVM_ABI annotations in CodeGen (#199921)
This updates most LLVM_ABI annotations in the CodeGen headers to match
expected usage:
* All public APIs should be properly annotated.
* Inlined functions should not be annotated.
These changes were done by a script fixing annotations on LLVM public
headers and manually checked.
This effort is tracked in #109483.
[libc++] Simplify the implementation of conditional a bit (#199916)
We can use our internal `_If` instead of specializing `conditional` for
selecting the appropriate type.
[LSR][AArch64] Precommit tests showing lack of `mul vl` addressing (NFC) (#200149)
These loops could be using `mul vl` addressing in the loop and use fewer
base registers and have a smaller loop setup.
[mlir][spirv] Remove unnecessary assertion (#200137)
The use of the variable in the assertion was causing a build failure
when compiling with assertion off and hence the variable becomes unused.
Signed-off-by: Davide Grohmann <davide.grohmann at arm.com>
[NFC][TableGen] Reorganize GlobalISelMatchTable.h
This file was a bit of a kitchen sink, and the implementation of the
match table is sufficiently difficult to get comfortable with already.
I spent the past few weeks looking at it, finding improvements, etc. and
I think a nice way to make it a bit easier to approach is to split up
the file a bit so that the main implementation (Matchers.h/.cpp) only
contains the code pertaining to the Matchers (RuleMatchers, Preds, etc.).
We now have 3 files:
- One for type (LLT) related utilities.
- One for the MatchTable emission logic, which is generic and should not
be tied to any specific implementation. It just has the tools to emit
the opcodes for the table.
- One for the entire Matcher system, including PredicateMatchers and so on.
[LangRef] Specify that syncscopes can affect the monotonic modification order (#189017)
If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.
So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
Related RFC: https://discourse.llvm.org/t/rfc-clarifying-llvm-irs-concurrent-memory-model/90480
[DirectX] Generate shader debug file name part in llc (#199555)
This change modifies DXContainerGlobals pass to generate debug name
(ILDN) part in DXContainer. ILDN part allows consumers to find PDB file
containing shader debug info.
As ILDB emission PR is not merged yet, and PDB file creation is not
upstreamed yet, debug name is generated based on MD5-hash of bitcode
module in DXIL part.
This corresponds to DXC behavior when a shader is compiled with `/Zi
/Qembed_debug /Zsb` flags (with `/Qembed_debug`, DXC does not produce an
actual PDB file, but still emits ILDN, `/Zsb` tells DXC to use bitcode
from DXIL to compute hash).
However, here ILDN is emitted for any debug info flag configuration.
assuming that it won't break debug info consumers, and that PDB creation
will be added later.