[MLIR] Add `IntegerDivisibilityAnalysis` and `InferIntDivisibilityOpInterface` (#197728)
This patch is a port from
https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Dialect/Util/Analysis/IntegerDivisibilityAnalysis.cpp
to upstream
It introduces a dataflow analysis that tracks integer divisibility
(divisor + remainder lattice) for SSA values, plus an op interface
`InferIntDivisibilityOpInterface` for ops to participate.
It adds:
* `IntegerDivisibilityAnalysis` produces a `Divisibility` lattice
`{divisor, remainder}`
* `InferIntDivisibilityOpInterface` interface
* External-model implementations for `arith` and `affine` ops
* `test-int-divisibility` test pass + lit tests
Example:
Here is the usual approach to laod element `i` from `i4` buffer emulated
[11 lines not shown]
[flang][acc] Accept component of global variable in `acc declare` (#197819)
This MR partially extends the current implementation to accept cases of
`acc declare` on a `parent%comp` whenever the `parent` has been `acc
declare`d with the same clause. This is done by generating only the acc
global constructor only for mapping the parent as the child is expected
to be part of parent.
The limitations still remain as a TODO unless it can be proven parent is
mapped. A generic implementation would need either compiler generated
ordering on the global constructors used for mapping or runtime managed
ordering.
[AArch64] Do not pass debug insn to liveness analysis (#198021)
Fix another stepBackward location.
Debug instructions must not affect liveness analysis. stepBackward has
an assertion failure on debug instructions after
https://github.com/llvm/llvm-project/pull/193104.
Signed-off-by: John Lu <John.Lu at amd.com>
[RISCV][MCA] Use the new infrastructure for SiFive P500 and P800's tests. NFC (#198016)
Some tests -- mostly vector crypto -- are kept for SiFive P800.
NFC.
[flang][NFC] Finishing touches on legacy lowering conversion (#197973)
At the beginning of legacy lowering conversion, some tests were
initially converted to emit FIR. After some discussion, it was decided
to revisit those tests and convert them to emit HLFIR. This change
completes that step and should be the final change in removing vestiges
of legacy lowering.
Assisted-by: AI
[lldb] Avoid unnecessary strlen of mangled names in ConstString (NFC) (#197995)
C++ mangled names are known to be quite long at times. This change makes
use of available length data, instead of using the `StringRef(const char
*)` constructor which calls `strlen`.
The main detail is to replace `selectPool(llvm::StringRef(raw))` with a
call to `selectPool` using a readily available StringRef.
[libc] Reduce number of iterations in threading tests. (#198030)
Previously the threading tests were running noticeably slowly and
causing flakey timeouts on some buildbots (e.g.
https://lab.llvm.org/buildbot/#/builders/71/builds/48420)
[CIR] Cast global var address to declared type at dtor call site
A C++ global with a constexpr default constructor that fixes the active member of a union — `std::basic_string`'s SSO `__short` variant is a common example — has a `cir.global` whose stored record type is the narrowed shape of that active variant. Classic CodeGen does the same (`@g = global { { { [16 x i8] } } } zeroinitializer`) and accepts the resulting `__cxa_atexit(@D1, @g, ...)` because LLVM IR uses opaque pointers. CIR has typed pointers, so the `cir.call` registering the destructor for `__cxa_atexit` carries an operand type that doesn't match the dtor's `this` parameter. This trips 16 libcxx tests and 71 cases total across libcxx, MultiSource, SingleSource, and SPEC in our build.
`verifyPointerTypeArgs(oldF, newF, userMap)` in `CIRGenModule::applyReplacements` (`clang/lib/CIR/CodeGen/CIRGenModule.cpp:1700`) catches this when ctor-dtor aliases are enabled and D1 is RAUW'd by D2. Without aliases, the `cir.call` op verifier rejects the same operand-type mismatch directly.
The fix mirrors the cast pattern `emitGlobalVarDeclLValue` (`clang/lib/CIR/CodeGen/CIRGenExpr.cpp:441-445`) already uses for every AST-level reference to a global: bitcast the result of `getAddrOfGlobalVar` to `convertTypeForMem(type)` before any typed-pointer op consumes it. `getAddrOfGlobalVar` itself stays raw so callers that walk to the underlying `GetGlobalOp` via `getDefiningOp()` keep working.
`global-dtor-union-narrowed.cpp` pins the CIR bitcast, the lowered LLVM helper-wrapped `__cxa_atexit`, and the equivalent OGCG direct `__cxa_atexit`.
[clang] remove lots of "innocuous" addrspacecasts (#197745)
These originally added many addrspacecast early on, where often it
wasn't needed, or could be added later. This makes these fairly
straightforward to remove (other than changing some tests). By swapping
all calls to this function (except the intended semantic ones for
parameters and variables) with the uncasted version, AMDGPU will
eventually not need to attempt to apply a fix up afterwards by having
different addrspace maps. This PR does not yet fix all calls, but the
main ones that might have been missed are in matrix/vector extensions
(which seem to weirdly override the memory type for temporary values to
be different from the type of the object in all other uses).
Fix dynamic map iterator target data lowering
Hoist runtime-sized offload map array allocation for regional target data with
iterator modifiers so the dynamic count and arrays dominate both begin and end
runtime calls.
[libc] Fix shadowing in printf (#197985)
The 320 bit float converter defined StorageType and DECIMAL_POINT
outside of its functions. This caused issues with other definitions of
the same variables after #197516.
[DWARFLinker] Preserve source order of member subprograms (#196443)
Children of class/struct/union/interface DIEs in the parallel
DWARFLinker's artificial type unit are sorted lexicographically by the
TypePool synthetic-name key. Data members already get a positional slot
through the synthetic name, but subprograms don't: they collapse to
alphabetical-by-linkage-name order. That breaks LLDB's
SBType::GetMemberFunctionAtIndex(N), which contractually returns members
in DWARF order.
Add a uint32_t SortKey on TypeEntryBody, atomically min-merged across
CUs with the input DIE's ordinal in its parent's child list, and consult
it before the synthetic-name key in TypePool's comparator. The ordinal
is computed by cloneDIE's existing child walk and threaded into
createTypeDIEandCloneAttributes. Scoped to children of
class/struct/union/interface so top-level types in the artificial type
unit keep their existing sort order.
[clang][DependencyScanning] Preserve Necessary Preprocessor Callbacks during By-name Lookup (#197731)
The by-name lookup logic uses new dependency collector callbacks per
lookup. The algorithm used to wipe out all callbacks for each query.
This turned out to be perilous. We have two raw pointers in the
preprocessor that point to the callbacks, and removing all callbacks per
query can lead to use-after-free situations through these dangling
pointers. Resetting the dangling pointers to null does not really work
either, since there may be dependencies between the callbacks and other
data structures. An example of this is the `PreprocessingRecord *Record`
callback and the `GlobalPreprocessedEntityMap` in ASTReader. Hence, to
fix the use-after-free issue, we preserve the callbacks that the
preprocessor may hold a raw pointer to.
This is not intended to indicate how we want to handle this in the long
run. We should avoid removing PP callbacks and reset their states across
by-name lookups.
rdar://175362366
[AMDGPU] Drop target requirements in test (#198015)
These were only necessary when the test was in the wrong folder. Now
that the test is in the right folder, it will only be marked as
supported when AMDGPU is enabled as a target, so the additional
requirement in the test is redundant.
[SLP] Preserve profitable trees when subtree trimming would reduce to buildvector-only
In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.
Reviewers: RKSimon, hiraditya, bababuck
Pull Request: https://github.com/llvm/llvm-project/pull/197763
[AMDGPU] Use shorter form for i16 operands
For 16-bit operands an inline constant is zero extended
which in particular allows to use FP constants. These
will have 16 bits of zeroes in the high half and FP16
value in the low 16 bits.
[flang-rt] Rework findloc.cpp to dispatch target at runtime (#197756)
Summary:
The previous code had a combinatorial explosion of functions by
templating on both the source and target types. This created around 170
instantiations. Instead we just template on the source type and then use
a simple runtime check. This should not affect performance in a
significant way, it introduces maybe a few branches in what is already a
non-trivial operation that I do not think justifies a two-minute compile
time.
The result is that this file goes from 120 seconds to 12 on my machine
and the resulting file goes from 7.2 MiB to 757 kiB. Functinally this
makes us instantiate 1/10th the functions.
[dsymutil] Collect .cas-config files in dSYM bundles (#197818)
When caching is enabled the Swift compiler might substitute CAS
identifiers for on-disk paths. In order to resolve them the build system
puts a .cas-config file in the build directory. Dsymutil needs to
collect the contents of these files so tools consuming the dSYM (which
do not have access to the original build directory) can resolve these
CAS identifiers, too.
Assisted-by: claude
rdar://169986664