[flang][OpenACC] Diagnose illegal routine calls in parallel loops (#190068)
Add routine call checking to `AccStructureChecker` to reject OpenACC
routine calls whose parallel level is incompatible with the enclosing
loop directive (e.g., calling a worker‑level routine from a
vector‑parallel loop), as required by the OpenACC specification.
[Inliner] Fix dangling pointer in OriginallyIndirectCalls. (#191242)
changeToInvokeAndSplitBasicBlock replaces an exising call instruction
with an invoke instruction. This leaves a dangling pointer in
OriginallyIndirectCalls. This means we miss !inline_history metadata on
the invokes replacing the direct calls.
It also cause non-determinism, where the inliner adds !inline_history
entries to unrelated call instructions, if we happen to re-allocate a
new call at the same address as a dangling pointer in the set.
PR: https://github.com/llvm/llvm-project/pull/191242
[ADT] Fix SmallVector append with input iterators (#191030)
append()/assign()/insert() iterate over the iterator twice -- once to
get the length and once to actually append the content. This is only
permitted with forward iterators, not input iterators.
For append()/assign(), implement a version that uses emplace_back() in
a loop. insert() at the end is append(); for insert() in the middle,
append() elements first and then rotate them into their correct
position.
Originally introduced in https://reviews.llvm.org/D33919.
[clang-doc][NFC] Delete redundant lines in JSONGenerator (#191011)
During the merging for the Mustache MD backend, I forgot to delete the
earlier, obsolete serialization for namespaces. It's being overwritten
by the correct call later. Also deletes a duplicate typedef.
[mlir][NVVM] Add InferTypeOpInterface to sync and ldmatrix ops (#188238)
Add InferTypeOpAdaptor to 5 NVVM ops with deterministic result types:
- VoteSyncOp: ballot -> i32, any/all/uni -> i1
- MatchSyncOp: any -> i32, all -> struct<(i32, i1)>
- ShflOp: result matches val type, or struct<(val_type, i1)> with
return_value_and_is_valid
- LdMatrixOp: i32 or struct of i32s based on num and shape
- ClusterLaunchControlQueryCancelOp: is_canceled -> i1, others -> i32
Note: this is a source-breaking change for Python callers that pass
result types positionally.
Co-authored-by: Claude <noreply at anthropic.com>
[PTX][Debug] Add .loc directives to inlined PTX. (#177718)
This PR adds .loc directives to the inlined PTX as it gets emitted into
PTX file.
This allows PTXAS to create .debug_line with with entries for those
instructions, and helps profiler attribute perf counters to source code.
[LifetimeSafety] Detect use of a reference type as a use of underlying origin (#184295)
Writing through a reference (e.g., `ref = 10`) does not rebind the
reference, so it should not kill the liveness of its underlying origin.
Fixes #180187
[InstCombine] Restore narrowing of double to float for integer casts (#190550)
Resolves #190503
This patch modifies `visitFPTrunc` to simplify the following expression:
```llvm
fptrunc(OpI (sitofp/uitofp x), (sitofp/uitofp y))
```
to
```llvm
OpI (sitofp/uitofp x), (sitofp/uitofp y)
```
`getMinimumFPType` now calls `canBeCastedExactlyIntToFP` on `x` and `y`.
This allows a double to be narrowed to a float if the source operands originate from sitofp/uitofp and can be represented exactly in the target float type.
This fixes a regression pointed out in the issue, where `visitFPExt` began folding `fpext(sitofp)` into `uitofp nneg i64 %x to double`, causing `visitFPTrunc` to lose the `fpext` it relied on to recognize the narrowing opportunity. On certain target, this would cause more expensive operations (i.e, division with f64 instead of f32).
[mlir][spirv] Allow redefinition in OpName instructions (#191223)
The SPIR-V specification allows multiple conflicting OpName instructions
to redefine the name associated with a given `<id>`. Update the
deserializer to handle this case by using the last declared name.
[CIR] Add canonicalizer for CleanupScopeOp (#191084)
This change adds a canonicalizer for CleanupScopeOp that erases any
cleanup scope with a trivial cleanup region, inlining the contents of
the body region into the block in place of the cleanup scope op. It also
erases any EH-only cleanup scope whose body region contains only a yield
operation, dropping the cleanup region contents even if they were not
trivial because the EH cleanup is not reachable in this case.
Assisted-by: Cursor / claude-4.6-opus-high
[RISCV] Improve lowering of llvm.vector.reduce.mul (#190628)
RVV doesn't have a vredprod instruction, so we're forced to emulate
these.
The current ExpandReductions lowering which gets used trips an
unfortunate
behavior with exact VLEN - we end up with a bunch of vsetvli toggles
caused
by an interaction with lowerShuffleViaVRegSplitting. We end up doing all
the sub-m1 shuffles at m1, but don't recognize that we could do the
multiplies in m1 as well. As a result, we end up toggling back and forth
between LMULs.
This change is somewhat of a blunt hammer; it adjusts the lowering to
do the greater than m1 lowering via the current strategy, then do
a vector.extract, then do the same shuffler reduction lowering on
the smaller type. This has the effect of fixing the exact VLEN case,
and improves register pressure for all cases. The cost is that we end
up with an exact vsetvli to toggle to the m1 type before the sub-m1
[12 lines not shown]
[DeveloperPolicy] Update information on commit authorship (#191220)
The sections on commit authorship have not been updated since the
transition to Github for PRs. This patch attempts to update them.
This is aimed at addressing questions recently raised in
https://discourse.llvm.org/t/attributing-a-commit-to-a-different-author/90490.
I guess this is technically a policy change, but does not change
anything *de facto* as far as I can tell.
---------
Co-authored-by: Petr Hosek <phosek at google.com>
[libc] Add missing errno_macros.h include to pthread schedparam (#191235)
Both pthread_attr_getschedparam and pthread_attr_setschedparam use
ENOTSUP but relied on getting it transitively through <pthread.h>. Added
the explicit include so these files compile in standalone builds with
-nostdinc.
[libc] Add LIBC_FULL_BUILD guard to stdint_proxy.h (#191234)
In full-build mode with -nostdinc, the system <stdint.h> is unavailable.
Use the internal stdint-macros.h header instead, falling back to the
system header in overlay mode.
[flang] preserve logical operations in single FIR operation (#190771)
This patch adds new operations to represent AND/OR/EQV/NEQV logical
operation with the main goal of preserving them at a higher level in the
IR to make it easier to match them and to dispatch them to atomic
implementations when working on reductions.
They are only generated when one of the argument is actually a logical,
otherwise, the when dealing with AND/OR... where both operands are
comparisons, the i1 arith operations are still generated since using the
new operation would make the IR more complex and preserving logical
operation is only valuable when one of the operand is a logical
variable.
[clang][Driver] Ensure intermediate bitcode files are written according to `/Fo` (#189977)
With the following compilation process:
```
$ mkdir -p src/ tmp/
$ cat << 'EOF' > src/main.c
int main() { return 0; }
EOF
$ clang-cl /c /Fo:tmp/ /clang:-fembed-bitcode src/main.c
```
the object file `main.obj` is generated in the `tmp/` directory but the
intermediate `main.bc` is placed in the current working directory.
This PR ensures that intermediate `.bc` files are written to the same
directory specified by `/Fo`.
[AMDGPU] A Vulkan-style memory model weaker than the LLVM model
Add a new AMDGPU memory model specification that is weaker than the LLVM memory
model using Vulkan-style availability/visibility semantics and scoped
operations. The model allows more efficient implementations while maintaining a
safe-by-default mapping to the standard LLVM model.
[libc] Fix 'finish()' being called by the client instead of the server (#191226)
Summary:
This function is supposed to manage the doorbell interrupts. The flow
that was intended was that the client notified work to wake the server
and the server finished the work so it didn't go back to sleep until
everything was done. This was reversed and we had the client finishing
work and then stalling on it.
[libc] Add a redirecting <syscall.h> header. (#191069)
The amount of legacy code including `<syscall.h>` header instead of
`<sys/syscall.h>` (which is the regular header location on Linux
systems) out there is large.
Add a simple one-liner redirecting header to fix this compatibility
issues. In this PR I omit the regular licensing blurb at the top, given
the transient nature of this file, but I'm happy to add this if needed.
Also, given that it's effectively a compatibility shim, YAML generation
is not used.
[clang][modules] Close module file descriptors (#191227)
This was missed in the original PR and was causing "too many files open"
errors on real workloads.
[DA] Fix overflow of findBoundsALL in BanerjeeTest
Fix signed overflow handling in `findBoundsALL` for the Banerjee test.
The previous implementation computed bounds using `getMinusSCEV` and
`getMulExpr` without checking for signed overflow, which could produce
incorrect bounds when coefficients have extreme values.
- Add `mulSCEVNoSignedOverflow` helper function that checks for
multiplication overflow before computing the result
- Use `minusSCEVNoSignedOverflow` and `mulSCEVNoSignedOverflow` in
`findBoundsALL` to safely compute bounds, returning `nullptr`
when overflow would occur
[clang][ssaf] Fix CLANG_PLUGIN_SUPPORT=OFF SSAFExamplePlugin cmake errors (#191229)
Such builds would fail with:
```
...
CMake Error at cmake/modules/AddLLVM.cmake:2245 (add_dependencies):
The dependency target "SSAFExamplePlugin" of target
"check-clang-utils-update_cc_test_checks" does not exist.
...
```
This fixes it by using the same condition for the test dependency as is
used for deciding to build the plugin in
clang/lib/ScalableStaticAnalysisFramework/Plugins/CMakeLists.txt
Reland [MC] Fuse relaxation and layout into a single forward pass (#190318)
This relands debb2514ea7f, which was reverted by #189548 due to ARM
spurious `cbz` out of range error (Chromium, Android).
---
Replace the two-pass inner loop in relaxOnce (relaxFragment +
layoutSection) with a single forward pass that sets each fragment's
offset before processing it.
- Extract relaxAlign from layoutSection's FT_Align handling and call
it from relaxFragment. FT_Align padding is computed inline with the
tracked Offset, so alignment fragments always see fresh upstream
offsets. This structurally eliminates the O(N) convergence pitfall
where stale offsets caused each iteration to fix only one more
alignment fragment.
- The new MCAssembler::Stretch field tracks the cumulative upstream size
[55 lines not shown]