[llubi] Add support for byval pointer arguments (#201852)
This patch adds support for the byval attribute. The hidden copy is
implemented as memcpy with the allocation size of the specified type.
See https://github.com/llvm/llvm-project/pull/205576 for more
information.
Revert "[Instrumentor] Add runtime examples: [1/N] A flop counter" (#205960)
This reverts commit 61cbfabb7ade682a64f516c871b2bacb1e3e324a.
Fails compiler-rt standalone builds, though, locally it works fine :(
[X86][APX] Implement push+push2+push pre-alignment strategy for PP2 (#205031)
Replace the dummy "push %rax" stack-alignment padding for APX push2/pop2
(PP2) with a push+push2+push strategy: when an even number of callee-saved
GPRs is involved, a single CSR push provides the 16-byte alignment instead
of a throwaway push %rax, and the remaining registers use push2/pop2. The
padForPush2Pop2 flag and its associated dummy push, SUB/LEA padding, and
SEH_StackAlloc emission in spill/restoreCalleeSavedRegisters are removed.
BuildStackAdjustment now uses NF (no-flags) variants of ADD/SUB, but
only as a smaller replacement for LEA, i.e. only when EFLAGS must be preserved
across the adjustment. When EFLAGS is dead the plain SUB/ADD is kept, which is
shorter than the EVEX-encoded NF form. The NF opcodes are 64-bit
(SUB64ri32_NF/ADD64ri32_NF), so they are not used for the x32 ABI, and
they are recognized in mergeSPUpdates and the epilogue backward scan.
Update LIT tests accordingly.
Assisted-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
[VectorCombine] Fold zero tests of or/umax reductions (#205622)
Recognize equality and inequality tests against zero on vector.reduce.or
and vector.reduce.umax. When profitable, replace the scalar reduction
and
compare with a lane-wise comparison followed by an i1 reduce.or or
reduce.and.
Run the existing zero-preserving reduction fold first to retain its more
specific canonicalization opportunities.
Proof: https://alive2.llvm.org/ce/z/pyoTwP
Fixed https://github.com/llvm/llvm-project/issues/205028
[Instrumentor] Add runtime examples: [1/N] A flop counter (#205698)
This adds a instrumentor-tools folder into compiler RT to showcase use
cases of the instrumentor. The initial example is a program that, via
instrumentation, counts the number of flops performed. Call and
intrinsic support will follow after #198042.
This is the second try with more CMake magic after
https://github.com/llvm/llvm-project/pull/205221 failed on some
platforms.
Partially developped by Claude (AI), tested and verified by me.
[flang][openacc] add acc.routine op for external names added in bind clauses. (#205591)
This adds acc.routine ops for the func.func ops that declare external
functions bound for device specific. This is needed to get the
ACCRoutineToGPUFunc pass to move the function declaration into the
correct region.
This is a follow-up from
[#203088](https://github.com/llvm/llvm-project/pull/203088) which
unblocked the original pass that was stalling bind clauses, but failed
further down the pipeline.
[CIR] Implement Direct+canFlatten in CallConvLowering
ArgKind::Direct with a multi-field coerced struct and the canFlatten flag
means the coerced struct is passed as one scalar wire argument per field.
CallConvLowering was passing it as a single aggregate, ignoring canFlatten.
A new getFlattenedCoercedType helper recognizes the Direct+canFlatten arg
shape. At the callee, insertArgCoercion replaces the single block argument
with N scalar block args, stores each into an alloca of the coerced struct
type, reloads it, and coerces back to the original argument type when the
coerced struct type differs from the original. The Ignore-drop loop and
updateArgAttrs account for the N block-argument slots a flattened arg
occupies; updateArgAttrs also shapes them on the sret return path.
At the call site, when the operand type differs from the coerced struct
type the operand is coerced through a memory slot and each field is read
from that slot with cir.get_member + cir.load (via a new emitCoercionToMemory
helper that returns the coerce-slot pointer without loading the whole
aggregate); when the types already match each field is extracted directly
[7 lines not shown]
Revert "[libc++] P3798R1: The unexpected in std::expected (#204826)" (#205597)
Reverts 45a65bb48b5925707f43d08e30df2263a5e4e268.
Currently, there is no consensus among LWG and standard library
maintainers that P3798R1 should be applied as a Defect Report. So it is
better to revert the paper application for now and then reapply it as an
addition in C++29 when C++29 mode is ready.
[llvm][GVNSink] Avoid non-determistic iteration order over NeededPHIs
The iteration order of DenseSet is not guaranteed, which affects the
output of code generated with GVNSink enabled. This can cause code to be
emitted in differing order, affect section ordering and in some cases
was reported to sometimes result in larger binaries due to increased
padding between sections.
This patch addresses this by using SetVector, which has a deterministic
iteration order.
[CIR] Implement Direct+canFlatten in CallConvLowering
ArgKind::Direct with a multi-field coerced struct and the canFlatten flag
means the coerced struct is passed as one scalar wire argument per field.
CallConvLowering was passing it as a single aggregate, ignoring canFlatten.
A new getFlattenedCoercedType helper recognizes the Direct+canFlatten arg
shape. At the callee, insertArgCoercion replaces the single block argument
with N scalar block args, stores each into an alloca of the coerced struct
type, reloads it, and coerces back to the original argument type when the
coerced struct type differs from the original. The Ignore-drop loop and
updateArgAttrs account for the N block-argument slots a flattened arg
occupies; updateArgAttrs also shapes them on the sret return path.
At the call site, when the operand type differs from the coerced struct
type the operand is coerced through a memory slot and each field is read
from that slot with cir.get_member + cir.load (via a new emitCoercionToMemory
helper that returns the coerce-slot pointer without loading the whole
aggregate); when the types already match each field is extracted directly
[5 lines not shown]
[flang][semantics][OpenACC] Warn for DEFAULT(NONE) scalars by default (#205683)
Change OpenACC `DEFAULT(NONE)` scalar handling to use the
pre-OpenACC-3.2 scalar behavior by default while emitting a warning.
Scalars referenced in a `default(none)` compute region without an
explicit data clause now warn by default instead of erroring. Arrays and
other non-scalars still error under `default(none)`.
Users can opt into OpenACC 3.2 strict scalar behavior with:
`-fopenacc-default-none-scalars-strict` and the default scalar warning
can be suppressed with: `-Wno-openacc-default-none-scalars-strict`
[CIR] Wire const goto labels into indirect branch (#201644)
A computed goto through a constant dispatch table -- the GNU static
dispatch-table idiom `static const void *tbl[] = {&&L1, &&L2}; goto *tbl[i];`
-- reached `errorNYI("Indirect goto without a goto block")` in
`emitIndirectGotoStmt`. #203644 emits the label-address constant (the
value-like `#cir.block_addr_info`) into the table, but it takes a label's
address in a constant context without registering the label as address-taken,
so no indirect-goto block exists for the following `goto *tbl[i]` to branch to.
(#203644 landed the constant attribute, its lowering, and the GotoSolver label
retention; this is the remaining dispatch wiring.)
`VisitAddrLabelExpr` in the constant emitter now records each label via
`takeAddressOfConstantLabel`, which instantiates the indirect-goto block and
tracks the label; `finishIndirectBranch` then adds those labels as
`cir.indirect_br` successors alongside the existing op-form labels. A label
named more than once in a table is kept as a distinct successor each time, to
match classic codegen.
[8 lines not shown]
Revert "[Clang] Optionally use NewPM to run CodeGen Pipeline" (#205943)
Reverts llvm/llvm-project#205928
Is missing dependencies in a shared libraries build. Will investigate
offline.
[SLP]Fix crash erasing reduced value extract still used by reduction
A reduced value vectorized in an operand subtree is replaced by an
extractelement that can be excluded from another reduction group's
candidates as incompatible, yet it is still consumed by the final
reduction. Keep such excluded extracts externally used so they are not
erased while vectorizing that group.
Fixes #205886
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/205942
[SSAF][PointerFlow] Upstream Reference-to-pointer binding tests
The majority of the content of rdar://179151476 duplicates the
PointerFlow analysis after
https://github.com/llvm/llvm-project/pull/203633. Therefore, we only
need to upstream the tests for better test coverage and proving the
duplication.
rdar://179151476
[RFC][CodeGen] Add generic target feature checks for intrinsics
This PR adds target-independent infrastructure for annotating LLVM intrinsics
with required subtarget feature expressions.
It introduces a TargetFeatures string field to intrinsic TableGen records.
TableGen emits an intrinsic-to-feature mapping table.
Both SelectionDAG and GlobalISel now perform this check before lowering target
intrinsics. This allows targets to opt in by annotating intrinsic definitions
directly, rather than adding custom checks during lowering, legalization, or
instruction selection.
This PR uses one AMDGPU intrinsic as an example.
[flang][OpenMP] Delete no longer needed CheckAllowedClause
This removes the older overload of CheckAllowedClause(clauseId).
After 0f1abfe0af that function was no longer doing anything.
[flang][cuda][acc] Fix use_device device attribute for USE-renamed variables (#205902)
Example:
```fortran
module m
complex(8), allocatable, pinned :: v(:,:)
interface callee
subroutine callee_x(x, n)
complex(8), device :: x(:,:)
integer :: n
end subroutine
end interface
end module
subroutine driver(n)
use m, only : callee, v_renamed => v
integer :: n
!$acc data copy(v_renamed)
!$acc host_data use_device(v_renamed)
[11 lines not shown]
[lldb] Reject DW_OP_deref_size with size 0 (#205911)
`Evaluate_DW_OP_deref` validated that the dereference size was `<= 8`
but
not that it was non-zero. The DWARF expression evaluator parses
untrusted operands, so a `DW_OP_deref_size` with size operand `0` is
reachable (it is hit by the lldb-dwarf-expression-fuzzer).
A zero dereference size flows into `DerefSizeExtractDataHelper`, which
constructs a `DataExtractor` with `addr_size == 0` and aborts on its
assertion. The unit test that feeds `DW_OP_lit0, DW_OP_deref_size, 0x00`
shows the crash:
```
[ RUN ] DWARFExpressionMockProcessTest.DW_OP_deref_size_zero
Assertion failed: (addr_size >= 1 && addr_size <= 8), function
DataExtractor, file DataExtractor.cpp, line 134.
#8 DataExtractor::DataExtractor(...)
#11 DWARFExpression::Evaluate(...)
[6 lines not shown]