[BOLT] Replace partial instructions with traps in patched entries (#205211)
Overwriting a function entry with a jump is likely to not perfectly
align with the instruction stream. If the end of the patch does not
fall onto an instruction boundary, the bytes following the jump are
orphaned and will have nonsensical interpretations. This can leave
other tools confused, especially since these orphaned bytes can
decode to instructions that do not nicely rejoin the still intact
part of the instructions stream. Overwrite these bytes with traps
in the PatchEntry pass.
Fixes #198455.
[mlir][bazel]: Remove GPU dialect deps from MemRefTransforms (#205624)
This change removes dead GPU dependencies (`GPUDialect` and
`NVGPUDialect`) from the `MemRefTransforms` target. These dependencies
are not needed by the transforms themselves and greatly increase the
build time (e.g., NVVMDialect.cpp alone requires two minutes to build).
This aligns the bazel build with the CMake configuration.
[AMDGPU] Lower uniform usubsat to SOP (#203155)
Prefer scalar (SALU) lowering for uniform `usubsat`, since usubsat(a, b)
= max(a, b) - b.
* i32: add a GCNPat matching uniform `usubsat` to S_MAX_U32 + S_SUB_I32
* i16: route uniform `usubsat` through `promoteUniformOpToI32` instead
of a TableGen pattern that hard-codes the 0xffff masks. This exposes the
zero-extends as real DAG nodes so KnownBits can fold the masks when the
high bits are already known zero; the promoted i32 usubsat then reuses
the scalar pattern. Promote-and-truncate is safe for usubsat because the
result always fits in the narrow type (unlike uaddsat).
Register USUBSAT with `setTargetDAGCombine` and the promotion dispatch,
return ZERO_EXTEND in `getExtOpcodeForPromotedOp`, and add it to
`isNarrowingProfitable` so divergent i16/i32 keep their native VALU
clamp form.
Co-authored by: Jeffrey Byrnes
[X86] combineMulToPMADDWD - match 256/512-bit SIGN_EXTEND nodes (#205606)
Now that the X86ISD::VPMADDWD handling is improving, we can remove some
of the limits that we had to prevent regressions
[SSAF] Properly handle contributors with multiple declarations (#204482)
A contributor entity can have multiple declarations all contributing
interesting facts. For example, a function declaration (not definition)
may have default arguments, which may provide pointer flow or unsafe
buffer usage facts. This commit groups declarations by their canonical
decls. The entity summary of a contributor will be collected from all
its decls.
In addition, this commit includes the following minor changes:
- Factor the common procedure of summary extraction and insertion into a
template function in SSAFAnalysesCommon.h.
- Convert the no-duplicate contributor assertion into a debug warning.
We need the release build to not crash.
rdar://179150798
[CIR] Skip trivially-recursive available_externally function bodies (#198363)
CIR was emitting available_externally bodies for glibc-style inline
wrappers whose sole call is back to the same asm-named symbol (via
__builtin_*). LLVM then treats the function as non-terminating and
can fold away surrounding null checks — the same failure mode as
classic CodeGen PR9614 (basename-style if (cwd) paths).
Port isTriviallyRecursive / shouldEmitFunction from CodeGenModule,
including the isInlineBuiltinDeclaration exemption, and skip emitting
those definitions. isTriviallyRecursive (and its
FunctionIsDirectlyRecursive visitor) lives on MangleContext, so both
classic CodeGen and CIRGen call getMangleContext().isTriviallyRecursive(FD).
Revert "[AArch64] Run cleanup one final time after peephole (#199711)" (#205633)
This reverts commit 448c3d54df7bcd5e5be2b5d051832ad00b4cc89c as it
causes
compile time regressions for little gain, and sounds like the dead
instructions
can be removed in a better way.
[Instrumentor] Add runtime examples: [3/N] Pointer tracking
The example shows how globals and stack allocations can be tracked. For
each we record if it was read/written and how long the time was between
creation and first use, and last use and deallocation. This is reported
at the end.
[Instrumentor] Add runtime examples: [2/N] A FP precision analysis
Second example:
Check all floating point operations and track if they could be done at
lower precision.
Partially developped by Claude (AI), tested and verified by me.
[mlir][emitc]: use converted result types when func.call has one result (#205191)
The lowering for `func.call` to emitc properly uses converted result
types when there are multiple return values from the called func, but
not when there is a single one.
[Instrumentor] Move common instruction IO functions into a class (#205460)
This commit moves several instruction-related IO functions into a class
instead of having them defined in the instrumentor namespace. We add the
BaseInstructionIO non-templated class because InstructionIO is a
templated class. Adding the common functions into InstructionIO would
force us to define them in the header.
AMDGPU/GlobalISel: Fix get.rounding s_getreg lowering (#205601)
Use llvm.amdgcn.s.getreg instead of emitting S_GETREG_B32 directly so
instruction selection applies the required SReg_32 operand constraint.
This was done for setreg but missed for getreg.
Fixes https://github.com/llvm/llvm-project/pull/205265 when expensive
checks are enabled.
[Instrumentor] Add runtime examples: [1/N] A flop counter
This adds a instrumentor-tools folder into compiler RT to showcase
use cases of the instrumentor. The initial example is a program that,
via instrumentation, counts the number of flops performed. Call and
intrinsic support will follow after #198042.
Partially developped by Claude (AI), tested and verified by me.
[lldb][NFC] Change type of Breakpoint's name list (#205429)
This is currently a `std::unordered_set<std::string>`. The downside of
this is that you need to have a `std::string` to perform a lookup of any
kind. This may require an allocation whenever we want to query the name
list. Even using `std::string_view` is not sufficient to perform a
lookup.
I propose that this instead be a `llvm::StringSet` which uses StringRefs
as its primary currency for insertions, lookups, and more.
---------
Co-authored-by: Jonas Devlieghere <jonas at devlieghere.com>
[Instrumentor] Add subtype IDs to complement type IDs for vectors/arrays
If the type of an argument passed to the instrumentation is a vector or
array, we still want to filter on the underlying type, and the
instrumentation might also need to know. Thus, we can now pass a subtype
ID, which is -1 except if it's a vector or array, then it's the element
type ID. Structs need to be handled differently.
[LoopInterchange] Remove some early exits in transform phase (NFCI) (#205563)
This patch removes some unnecessary early exits from the transformation
phase in LoopInterchange. Some of them are simply removed because they
are trivially unsatisfiable. Others are replaced with assertions. These
conditions should be checked in the legality check phase, so it should
be safe to add those asserts.
[OpenMP] Remove AST dump tests for non-variant clauses (#204493)
As was suggested during discussion of #200077, and supported by Johannes
in our discussion during his office hours today, this PR removes OpenMP
AST dump tests that do not test the `variant` clause. The full
motivation can be found in the description of the aforementioned PR, but
the short version is that they are a maintenance burden that hold off
improvements to `TextNodeDumper` for other parts of Clang, because they
match too many unrelated details.
[flang][OpenMP] Move clause validity checks into OpenMP-specific code
The checks for syntactic properties of clauses (e.g. uniqueness, being
required, etc.) were originally handled by infrastructure common to
OpenMP and OpenACC. That infrastructure, however, is not fully equipped
to handle OpenMP needs: being unable to express version-based properties
or clause set properties being two prominent examples.
The first step towards fulfilling the OpenMP requirements it is to
transfer the handling of clause validity checks into OpenMP-specific
code, which can then be modified without interfering with OpenACC.
In addition to that, this PR also changes the way that clauses on end-
directives are handled: first, a clause appearing on an end-directive
is checked to be allowed to appear on an end-directive, then all clauses
from the begin- and the end-directives are tested together. This unifies
checks for uniqueness of clauses that can appear in both places.
[NVPTX] Rewrite kernel signatures in param AS (#204192)
Rewrite the kernel signatures moving byval parameters directly into
entry parameter address space (similar to how ExpandVariadics handles
va_arg functions). This avoids the need for the somewhat hacky
nvvm_internal_addrspace_wrap intrinsic and enables better support for
parameter short pointers.
[flang][OpenMP] Lower target in_reduction for host fallback
Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.
Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. The runtime entry point takes and returns a generic, default-address-space pointer, so normalize a non-default-address-space captured pointer to the generic address space before the call and cast the returned private pointer back to the map block argument's address space, mirroring the in_reduction handling on omp.taskloop. Unsupported device/offload-entry and richer reduction forms remain diagnosed.
Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path, including a non-default-address-space case, and the remaining unsupported cases.