[AArch64] Do not generate indexed addressing mode for volatile accesses (#196305)
Instructions performing register writeback do not set a valid
instruction syndrome, making it impossible to handle MMIO in protected
hypervisors. Suppress the use of postinc/preinc addressing modes for
volatile accesses, which may be used to interact with MMIO.
There are three different places that can form indexed addressing modes:
* GISel via isIndexingLegal()
* SDAG via getPreIndexedAddressParts() and getPostIndexedAddressParts()
* AArch64LoadStoreOptimizer
The the latter case, exclude volatile accesses on SP (which are relevant
for stack probing) and MTE tag stores, as both cannot be MMIO.
Fixes https://github.com/llvm/llvm-project/issues/173014.
[llubi] Add basic support for provenance modeling (#185977)
There are four solutions to model the provenance in the memory:
1. `(allocid, bitindex)` for each bit: It follows the definition of byte
type.
2. `(allocid, bitindex)` for each byte: This assumes the pointer/byte
types are always byte-sized, and requires bitextract/bitinsert to shift
by multiples of 8, as posted in
https://discourse.llvm.org/t/rfc-add-a-new-byte-type-to-llvm-ir/89522/53.
I believe this is true in most real-world cases.
3. Assign a random tag for each memory object: The tag has the same
width as the address. It is stored in the memory like addresses. Thus,
each logical byte only occupies 4 bytes. When loading a pointer, the tag
is loaded and used to recover the provenance. Incorrect bit ordering
will result in nullary provenance (with a negligible rate of false
negatives). I think it is feasible because we can always turn a false
negative into a positive with a different seed. It is also compatible
with captured components
(https://github.com/dtcxzyw/llvm-ub-aware-interpreter/blob/d15dfef5bc0c1b30b05512bbc28fddb2b50cc0b1/ubi.h#L187)
[8 lines not shown]
[GitHub] Add InstCombine Contributor Guide to new contributor greeting comment (#199730)
I have always manually replied to new contributors, reminding them to
follow the InstCombine contributor guide. Let’s automate this process.
Now it will append the link to the guide when the PR changes the
InstCombine (and highly related components) files.
[CIR] Implement 'coroutine' exception handling lowering (#200045)
This patch implements the lowering to CIR for exception handling.
Unfortunately the missing components of Flatten-CFG don't work here, so
we only test that we get successfully to CIR, not to LLVM-IR.
This patch runs the 'await-resume' in a try/catch, and only if that
succeeds, does it run the coroutine body (also in a try/catch if there
is an exception handler).
This is nearly identical to the implementation in classic-codegen,
except we invert the resume-eh variable's value, so we can just use a
simple `if` op for the branch.
[libc] Move fixed buffer GPU test to an integration test (#200042)
Move the `fixedbuffer` GPU test to an integration test.
libc tests are intended to be GTest style tests written with the normal
`TEST(Suite, Test)` GTest macros. Example
[here](https://github.com/llvm/llvm-project/blob/main/libc/test/include/SignbitTest.h#L32).
This test has its own `main` which ends up causing a `main multiple
definitions` linker error when compiling for SPIR-V (work in progress).
I'm not sure why this error doesn't occur for AMDGPU, probably the fact
we have to compile with a ton less compile/linker flags for SPIR-V and
one of them hides the issue.
Specifically the fix is that we don't link against
`libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/LibcTestMain.cpp.o`
which has its own main which conflicts with the one defined in the test.
All other tests in this directory are integration tests too.
[4 lines not shown]
llvm: Fix most LLVM_ABI annotations in Analysis (#199019)
This updates most LLVM_ABI annotations in the Analysis headers to match
expected usage:
* All public APIs should be properly annotated.
* Inlined functions should not be annotated.
These changes were done by a script fixing annotations on LLVM public
headers and manually checked.
This effort is tracked in #109483.
[InstCombine] Drop the correct assume when working on assume bundles (#198404)
Currently, all assumes of the same kind in an assume bundle are dropped,
even though only a single one is actually checked to be redundant and
should be dropped. This introduces a new `removeOperandFromBundleAt`,
which instead drops a bundle at a specific position. This should also be
faster, since copying the bundles can now be done into an already
correctly allocated vector.
[LoongArch] Add `-fstack-clash-protection` support (#195595)
This PR adds stack probing and `-fstack-clash-protection` support to the
LoongArch backend and Clang driver.
The implementation is largely borrowed from the RISCV backend (cf.
#117612, #139731), with the same allocation-unrolling strategy for
const-sized allocations.
[flang][OpenMP] Lower target in_reduction for host fallback
Teach Flang lowering and MLIR OpenMP translation to carry
in_reduction through omp.target for the host-fallback path.
The translation looks up task reduction-private storage with
__kmpc_task_reduction_get_th_data and binds the target region's
in_reduction block argument to that private pointer, so uses inside the
region do not keep referring to the original variable.
The patch also preserves in_reduction operands in the TargetOp builder
path and ensures target in_reduction list items are mapped into the
target region when needed.
The device/offload-entry path remains diagnosed as not yet implemented.
[flang] Fix -E -dM macro dumping for stdin and .f90 inputs (#200144)
Issue:
flang -E -dM does not consistently print predefined macros for stdin and
.f90 inputs, unlike expected behavior.
Root cause:
Flang only initialized predefined macros when preprocessing was implied
by -cpp or suffix-based inference (mustBePreprocessed), but not when -dM
alone requested macro dumping.
Fix:
Treat -dM as an explicit trigger to initialize macro predefinitions in,
and add a stdin regression test for flang -E -dM - < /dev/null.
Fixes #198234
[SelectionDAG] Remove redundant asserts in WidenVecRes_ATOMIC_LOAD
These asserts duplicate guarantees already provided elsewhere:
- isVector() checks are redundant because findMemType() calls
WidenVT.getVectorElementType() and WidenVT.isScalableVector()
internally, and WidenVecRes_ATOMIC_LOAD is only reached from the
ATOMIC_LOAD case in WidenVectorResult, which is the vector path.
- The element-type and scalability consistency between LdVT and
WidenVT is a property of GetWidenedVector / getTypeToTransformTo.
llvm: Fix most LLVM_ABI annotations in CodeGen (#199921)
This updates most LLVM_ABI annotations in the CodeGen headers to match
expected usage:
* All public APIs should be properly annotated.
* Inlined functions should not be annotated.
These changes were done by a script fixing annotations on LLVM public
headers and manually checked.
This effort is tracked in #109483.
[libc++] Simplify the implementation of conditional a bit (#199916)
We can use our internal `_If` instead of specializing `conditional` for
selecting the appropriate type.
[LSR][AArch64] Precommit tests showing lack of `mul vl` addressing (NFC) (#200149)
These loops could be using `mul vl` addressing in the loop and use fewer
base registers and have a smaller loop setup.
[mlir][spirv] Remove unnecessary assertion (#200137)
The use of the variable in the assertion was causing a build failure
when compiling with assertion off and hence the variable becomes unused.
Signed-off-by: Davide Grohmann <davide.grohmann at arm.com>
[NFC][TableGen] Reorganize GlobalISelMatchTable.h
This file was a bit of a kitchen sink, and the implementation of the
match table is sufficiently difficult to get comfortable with already.
I spent the past few weeks looking at it, finding improvements, etc. and
I think a nice way to make it a bit easier to approach is to split up
the file a bit so that the main implementation (Matchers.h/.cpp) only
contains the code pertaining to the Matchers (RuleMatchers, Preds, etc.).
We now have 3 files:
- One for type (LLT) related utilities.
- One for the MatchTable emission logic, which is generic and should not
be tied to any specific implementation. It just has the tools to emit
the opcodes for the table.
- One for the entire Matcher system, including PredicateMatchers and so on.
[LangRef] Specify that syncscopes can affect the monotonic modification order (#189017)
If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.
So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
Related RFC: https://discourse.llvm.org/t/rfc-clarifying-llvm-irs-concurrent-memory-model/90480
[DirectX] Generate shader debug file name part in llc (#199555)
This change modifies DXContainerGlobals pass to generate debug name
(ILDN) part in DXContainer. ILDN part allows consumers to find PDB file
containing shader debug info.
As ILDB emission PR is not merged yet, and PDB file creation is not
upstreamed yet, debug name is generated based on MD5-hash of bitcode
module in DXIL part.
This corresponds to DXC behavior when a shader is compiled with `/Zi
/Qembed_debug /Zsb` flags (with `/Qembed_debug`, DXC does not produce an
actual PDB file, but still emits ILDN, `/Zsb` tells DXC to use bitcode
from DXIL to compute hash).
However, here ILDN is emitted for any debug info flag configuration.
assuming that it won't break debug info consumers, and that PDB creation
will be added later.
[LangRef] Specify that syncscopes can affect the monotonic modification order (#189017)
If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.
So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
Related RFC: https://discourse.llvm.org/t/rfc-clarifying-llvm-irs-concurrent-memory-model/90480
[AArch64] Fix definition of system register move instructions (#185709)
Current implementation of these instructions makes bit20 in the encoding
part of the system register operand, which is incorrect since
[specification](https://developer.arm.com/documentation/ddi0602/latest)
specifies that bit must be set to 1. This patch changes that and removes
the bit 20 from the encoding of the operand and makes it fixed field for
these instructions. It also fixes the parser and codegen by checking
that Op0 in system register name/encoding is correctly constrained to 2
or 3.
Depends on #185970
[TySan] Expose __tysan_set_type_unknown interface (#198800)
This can help work around issues like
[#143587](https://github.com/llvm/llvm-project/issues/143587)
The function is renamed with two trailing underscores to match the
naming scheme of the other sanitizers.
[LifetimeSafety] Improve diagnostics for use-after-scope (#200031)
Reuses the function for getting object information that was added in
#199432
Comes as part of completing #186002
Co-authored-by: Utkarsh Saxena <usx at google.com>