[mlir][NVVM] Add InferTypeOpInterface to sync and ldmatrix ops (#188238)
Add InferTypeOpAdaptor to 5 NVVM ops with deterministic result types:
- VoteSyncOp: ballot -> i32, any/all/uni -> i1
- MatchSyncOp: any -> i32, all -> struct<(i32, i1)>
- ShflOp: result matches val type, or struct<(val_type, i1)> with
return_value_and_is_valid
- LdMatrixOp: i32 or struct of i32s based on num and shape
- ClusterLaunchControlQueryCancelOp: is_canceled -> i1, others -> i32
Note: this is a source-breaking change for Python callers that pass
result types positionally.
Co-authored-by: Claude <noreply at anthropic.com>
[PTX][Debug] Add .loc directives to inlined PTX. (#177718)
This PR adds .loc directives to the inlined PTX as it gets emitted into
PTX file.
This allows PTXAS to create .debug_line with with entries for those
instructions, and helps profiler attribute perf counters to source code.
[LifetimeSafety] Detect use of a reference type as a use of underlying origin (#184295)
Writing through a reference (e.g., `ref = 10`) does not rebind the
reference, so it should not kill the liveness of its underlying origin.
Fixes #180187
[InstCombine] Restore narrowing of double to float for integer casts (#190550)
Resolves #190503
This patch modifies `visitFPTrunc` to simplify the following expression:
```llvm
fptrunc(OpI (sitofp/uitofp x), (sitofp/uitofp y))
```
to
```llvm
OpI (sitofp/uitofp x), (sitofp/uitofp y)
```
`getMinimumFPType` now calls `canBeCastedExactlyIntToFP` on `x` and `y`.
This allows a double to be narrowed to a float if the source operands originate from sitofp/uitofp and can be represented exactly in the target float type.
This fixes a regression pointed out in the issue, where `visitFPExt` began folding `fpext(sitofp)` into `uitofp nneg i64 %x to double`, causing `visitFPTrunc` to lose the `fpext` it relied on to recognize the narrowing opportunity. On certain target, this would cause more expensive operations (i.e, division with f64 instead of f32).
[mlir][spirv] Allow redefinition in OpName instructions (#191223)
The SPIR-V specification allows multiple conflicting OpName instructions
to redefine the name associated with a given `<id>`. Update the
deserializer to handle this case by using the last declared name.
[CIR] Add canonicalizer for CleanupScopeOp (#191084)
This change adds a canonicalizer for CleanupScopeOp that erases any
cleanup scope with a trivial cleanup region, inlining the contents of
the body region into the block in place of the cleanup scope op. It also
erases any EH-only cleanup scope whose body region contains only a yield
operation, dropping the cleanup region contents even if they were not
trivial because the EH cleanup is not reachable in this case.
Assisted-by: Cursor / claude-4.6-opus-high
[RISCV] Improve lowering of llvm.vector.reduce.mul (#190628)
RVV doesn't have a vredprod instruction, so we're forced to emulate
these.
The current ExpandReductions lowering which gets used trips an
unfortunate
behavior with exact VLEN - we end up with a bunch of vsetvli toggles
caused
by an interaction with lowerShuffleViaVRegSplitting. We end up doing all
the sub-m1 shuffles at m1, but don't recognize that we could do the
multiplies in m1 as well. As a result, we end up toggling back and forth
between LMULs.
This change is somewhat of a blunt hammer; it adjusts the lowering to
do the greater than m1 lowering via the current strategy, then do
a vector.extract, then do the same shuffler reduction lowering on
the smaller type. This has the effect of fixing the exact VLEN case,
and improves register pressure for all cases. The cost is that we end
up with an exact vsetvli to toggle to the m1 type before the sub-m1
[12 lines not shown]
[DeveloperPolicy] Update information on commit authorship (#191220)
The sections on commit authorship have not been updated since the
transition to Github for PRs. This patch attempts to update them.
This is aimed at addressing questions recently raised in
https://discourse.llvm.org/t/attributing-a-commit-to-a-different-author/90490.
I guess this is technically a policy change, but does not change
anything *de facto* as far as I can tell.
---------
Co-authored-by: Petr Hosek <phosek at google.com>
[libc] Add missing errno_macros.h include to pthread schedparam (#191235)
Both pthread_attr_getschedparam and pthread_attr_setschedparam use
ENOTSUP but relied on getting it transitively through <pthread.h>. Added
the explicit include so these files compile in standalone builds with
-nostdinc.
[libc] Add LIBC_FULL_BUILD guard to stdint_proxy.h (#191234)
In full-build mode with -nostdinc, the system <stdint.h> is unavailable.
Use the internal stdint-macros.h header instead, falling back to the
system header in overlay mode.
[flang] preserve logical operations in single FIR operation (#190771)
This patch adds new operations to represent AND/OR/EQV/NEQV logical
operation with the main goal of preserving them at a higher level in the
IR to make it easier to match them and to dispatch them to atomic
implementations when working on reductions.
They are only generated when one of the argument is actually a logical,
otherwise, the when dealing with AND/OR... where both operands are
comparisons, the i1 arith operations are still generated since using the
new operation would make the IR more complex and preserving logical
operation is only valuable when one of the operand is a logical
variable.
[clang][Driver] Ensure intermediate bitcode files are written according to `/Fo` (#189977)
With the following compilation process:
```
$ mkdir -p src/ tmp/
$ cat << 'EOF' > src/main.c
int main() { return 0; }
EOF
$ clang-cl /c /Fo:tmp/ /clang:-fembed-bitcode src/main.c
```
the object file `main.obj` is generated in the `tmp/` directory but the
intermediate `main.bc` is placed in the current working directory.
This PR ensures that intermediate `.bc` files are written to the same
directory specified by `/Fo`.
[AMDGPU] A Vulkan-style memory model weaker than the LLVM model
Add a new AMDGPU memory model specification that is weaker than the LLVM memory
model using Vulkan-style availability/visibility semantics and scoped
operations. The model allows more efficient implementations while maintaining a
safe-by-default mapping to the standard LLVM model.
[libc] Fix 'finish()' being called by the client instead of the server (#191226)
Summary:
This function is supposed to manage the doorbell interrupts. The flow
that was intended was that the client notified work to wake the server
and the server finished the work so it didn't go back to sleep until
everything was done. This was reversed and we had the client finishing
work and then stalling on it.
[libc] Add a redirecting <syscall.h> header. (#191069)
The amount of legacy code including `<syscall.h>` header instead of
`<sys/syscall.h>` (which is the regular header location on Linux
systems) out there is large.
Add a simple one-liner redirecting header to fix this compatibility
issues. In this PR I omit the regular licensing blurb at the top, given
the transient nature of this file, but I'm happy to add this if needed.
Also, given that it's effectively a compatibility shim, YAML generation
is not used.
[clang][modules] Close module file descriptors (#191227)
This was missed in the original PR and was causing "too many files open"
errors on real workloads.
[DA] Fix overflow of findBoundsALL in BanerjeeTest
Fix signed overflow handling in `findBoundsALL` for the Banerjee test.
The previous implementation computed bounds using `getMinusSCEV` and
`getMulExpr` without checking for signed overflow, which could produce
incorrect bounds when coefficients have extreme values.
- Add `mulSCEVNoSignedOverflow` helper function that checks for
multiplication overflow before computing the result
- Use `minusSCEVNoSignedOverflow` and `mulSCEVNoSignedOverflow` in
`findBoundsALL` to safely compute bounds, returning `nullptr`
when overflow would occur
[clang][ssaf] Fix CLANG_PLUGIN_SUPPORT=OFF SSAFExamplePlugin cmake errors (#191229)
Such builds would fail with:
```
...
CMake Error at cmake/modules/AddLLVM.cmake:2245 (add_dependencies):
The dependency target "SSAFExamplePlugin" of target
"check-clang-utils-update_cc_test_checks" does not exist.
...
```
This fixes it by using the same condition for the test dependency as is
used for deciding to build the plugin in
clang/lib/ScalableStaticAnalysisFramework/Plugins/CMakeLists.txt
Reland [MC] Fuse relaxation and layout into a single forward pass (#190318)
This relands debb2514ea7f, which was reverted by #189548 due to ARM
spurious `cbz` out of range error (Chromium, Android).
---
Replace the two-pass inner loop in relaxOnce (relaxFragment +
layoutSection) with a single forward pass that sets each fragment's
offset before processing it.
- Extract relaxAlign from layoutSection's FT_Align handling and call
it from relaxFragment. FT_Align padding is computed inline with the
tracked Offset, so alignment fragments always see fresh upstream
offsets. This structurally eliminates the O(N) convergence pitfall
where stale offsets caused each iteration to fix only one more
alignment fragment.
- The new MCAssembler::Stretch field tracks the cumulative upstream size
[55 lines not shown]
[LLVM][AArch64] Remove addrspace(0) restriction from all SVE/SME memory intrinsics. (#189992)
This requirement was not intentional, just the result of convenience.
Fixes: https://github.com/llvm/llvm-project/issues/183265
---------
Co-authored-by: nikhil-m-k <nikhil_mk at yahoo.com>
[compiler-rt] [Darwin] Move macOS ASAN reservation above 512G (#191039)
On macOS, the first 512G may contain platform-specific reservations. To
ensure compatibility with these reservations, this changes ASAN to
always map shadow memory above 512G on macOS.
rdar://174252720
[InstCombine][ProfCheck] Mark unknown select profiles in sub xor fold (#191192)
Mark the weights as explicitly unknown given we cannot statically infer
the weights without value profiling due to the select being synthesized
from a binary operation.
[flang][OpenMP] Add optional SemanticsContext parameter to loop utilities
Some of the utilities may be used in symbol resolution which is before
the expression analysis is done. In such situations, the typedExpr's
normally stored in parser::Expr may not be available.
To be able to obtain numeric values of expressions, using the analyzer
directly may be necessary, which requires SemanticsContext to be provided.
[BasicAA] Fix assertion failure in alias() caused by non-pointer base in DecomposeGEPExpression (#191180)
When stripping a bitcast in DecomposeGEPExpression, the resulting
operand may have a non-scalar-pointer type (e.g. <1 x ptr>). Proceeding
with such a type as the decomposition base violates the AA assumption
that all pointers are scalar pointer types, triggering an assertion
failure on alias() call.
Add a type check in the bitcast/addrspacecast handling path to return
not stripped V as base when the stripped operand is not a scalar pointer
type.
Add a lit test verifying no crash on valid IR containing such a bitcast,
and checking that the alias query conservatively returns MayAlias.
Fixes #191157