Revert "Add clang warning if fp exception functions are called without appropriate flags/pragmas" (#198341)
Reverts llvm/llvm-project#187860
Reason: this breaks compiling several different versions of libc, and is
also issuing diagnostics for platforms that are incompatible (see
https://github.com/llvm/llvm-project/pull/187860 for details).
Revert for now until we resolve how to move forward and reland.
[mlir][AMDGPU] Move memory access op folding to memref interfaces (#197310)
This PR implements IndexedAccessOpInterface and
IndexedMemCopyOpInterface for relevant ops in the AMDGPU dialect,
removing the custom folding pass we used to have now that there's
interfaces for this sort of thing.
As a result:
- The in-bonuds semantics of various AMDGPU ops have been clarified
- Interface methods to enable oob checks on DMA operations have been
added (to prevent accidental `disjoint`ing and the like)
- Said memref rewrite patterns have been hardened to allow for mixed
tensor/memref semantics.
- Helpers for detecting memory spaces were factored out of
`AMDGPUOps.cpp` so that they could be re-used in the interface
implementations.
# Breaking changes / migration
[4 lines not shown]
[mlir][GPU] Extend gpu.barrier with scope and named-barrier support (#195692)
This commit adds two features to gpu.barrier that are supported on
targets like recent AMDGPU chips, Nvidia's hardware, and SPIR-V.
The first of these is named barriers, which allow creating a barrier
object that is initialized with the number of subgroups that must arrive
at it before those subgroups are released. These are represented in MLIR
with a new `!gpu.named_barrier` type and created by
`gpu.initialized_named_barrier` operation. These named barriers then
become arguments to `gpu.barrier`.
The other change is adding a "scope" enum and using it to specify the
execution scope of barriers. This allows for rerpresenting cluster- and
subgroup-wide barriers (the latter exists on AMDGPU and Nvidia, and
while I suspect Nvidia has cluster-scope barriers, I didn't go looking)
and allows us to fully lower to SPIR-V's OpControlBarrier.
While these are two different features, I figured I'd land them in one
[4 lines not shown]
[FIRToMemRef] Fix fir.convert insertion inside omp.wsloop (#197653)
When replaceFIRMemrefs inserted a fir.convert before an op inside a
LoopWrapperInterface region (e.g. omp.simd inside omp.wsloop), it
violated the single-nested-op invariant, producing a verifier error. Fix
by walking up the LoopWrapperInterface parent chain and inserting before
the outermost wrapper instead.
Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
[Driver] Uniform handling of invalid rtlib across drivers (#198219)
This is mostly an NFC except for a different diagnostic being emitted.
The goal is to unify validation and handling of invalid rtlib value
across different drivers to simplify supporting more -rtlib= values in
the future.
Add noreturn call count to FunctionPropertiesAnalysis pass (#198322)
Adding this metric to visualize how many noreturn functions there are
with the idea of analyzing their relationship with unreachable
instructions
[clang][deps] Move `ModuleDepCollectorPP` to .cpp file (#197964)
This PR moves the `ModuleDepCollectorPP` type into the .cpp file. It's
an implementation detail that the header doesn't need to expose.
[AMDGPU][GlobalISel] Remove dependency on legal ruleset (#197371)
This fills in always legal rules, to remove the dependency on the legacy
ruleset. This is not guaranteed to be all the rules, just the ones that
appear in tests.
[llvm] Re-format aarch64-apple-tuning-features.td. NFC (#197777)
It's much easier to review diffs with each feature on its own line. Also
add an -implicit-check-not so we don't miss any CPUs going forward.
[SPIRV] Allow casting between CodeSectionINTEL and Generic storage classes (#197556)
In the previous versions of the SPV_INTEL_function_pointers
[spec](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc),
casts between the CodeSectionINTEL storage class (used for function
pointers) and the Generic storage class were illegal.
The spec was updated a few months ago, and the new version allows the
cast, specifying `CodeSectionIntel` as one of the overloaded storage
classes that can be represented by Generic, alongside `WorkGroup`, etc.
I also confirmed with a spec author that one of the intentions of the
spec updates was to allow the cast.
Update the SPIR-V backend to allow the cast. This is basically required
to use function pointers in real world use cases.
---------
Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
[libc][NFC] Fix #endif comments in hdr/ proxy headers (#198313)
The #endif closing the LIBC_FULL_BUILD guard used the CMake variable
name LLVM_LIBC_FULL_BUILD in its comment rather than the preprocessor
macro LIBC_FULL_BUILD that the #ifdef above references. These are
distinct: LLVM_LIBC_FULL_BUILD is the CMake option; LIBC_FULL_BUILD is
the C macro defined via -DLIBC_FULL_BUILD when that option is ON.
Fixed 113 files under libc/hdr/ with a mechanical substitution.
Assisted-by: Automated tooling, human reviewed.
[SLP] Preserve profitable trees when subtree trimming would reduce to buildvector-only
In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.
Original Pull Request: https://github.com/llvm/llvm-project/pull/197763
Recommit after unrelated revert in https://github.com/llvm/llvm-project/pull/198265
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/198336
[CI] Run libc tests on clang changes (#198295)
The libc tests are relatively lightweight, and given we build libc with
a just built clang, it's very easy for clang changes to cause issues in
libc, especially with -Werror. For example, #187860 broke libc due to
adding a new warning that libc was not clean on.
[SLP] Prefer VF-matching scalar-set match in gather-shuffle lookup
In isGatherShuffledSingleRegisterEntry, the perfect-match search accepted
an entry that isSame(TE->Scalars) regardless of the entry's vector factor.
isSame can succeed via ReuseShuffleIndices on an entry whose actual VF is
smaller than TE->Scalars.size(); the subsequent mask construction then
copies TE->getCommonMask() indices that overrun the chosen source's lanes,
producing wrong shufflevector masks and a more-poisonous result than the
scalar code.
Fixes #197765
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/198334
[LLVM] Precise error message for intrinsic signature verification (1/n) (#196802)
Generate more precise error message when intrinsic signature
verification fails. Keep track of the current position/component of the
intrinsic signature being checked and print a more descriptive error
message which includes the position/element of the signature that failed
and the reason it failed.
Note that not all cases in `matchIntrinsicType` generate errors, so have
a temporary fallback to keep generating a generic error message in those
cases. This fallback will be eventually removed.
Added a C++ unit test for testing intrinsic struct return type that is
either an identified struct or a packed struct, as these cases cannot be
created from a .ll file directly (since autoupgrade in the parser fixes
them up).
[SelectionDAG] Fix miscompile in known-0/1 setcc fold with XOR (#196804) (#197767)
When simplifySetCC folds `(xor X, C) != 0` (where the XOR result is
known 0/1) into `TRUNCATE(XOR X, C)`, later DAG combines can incorrectly
fold the XOR back into its source operand, losing the NOT semantics.
This causes the x86 backend to test the original value instead of the
XOR result, inverting the condition and producing wrong code.
Fix by folding `(xor X, C) ==/!= N1` directly into `setcc(X, N1^C,
cond)` instead of returning TRUNCATE(XOR). The SETCC form is canonical
and immune to the problematic DAG combine.
Fixes #196804.