clang-format/test: Anchor the empty .clang-format-ignore to test_exec_root (#203444)
The test suite's lit.local.cfg creates an empty .clang-format-ignore at
config discovery time to protect the multiple-inputs[-inplace].cpp tests
that work on files in temporary locations.
This file should be written to where the tests execute instead of the
CWD during config discovery. The CWD might not even be an ancestor of
where the tests execute, and it might be the repository root which does
have a .clang-format-ignore that is incorrectly clobbered without this
change.
An alternative would be to just fix the tests that need to be protected,
but having a blanket guard like this does seem like a reasonable thing
to do.
Fixes: 915de1a5889c ("Generate empty .clang-format-ignore before running
tests (#136154)")
[X86] combineConcatVectorOps - concat(roti(x,i),roti(y,i)) -> roti(concat(x,y),i) on non-vlx targets (#203528)
128/256-bit rotates are widened in tablegen, we don't need to limit
these to VLX targets - any AVX512 target can perform these
We already have test coverage to ensure 128-bit XOP rotates don't get
concatenated to 256-bit
[mlir] Check for argument uses in test-func-erase-arg pass (#203367)
The -test-func-erase-arg pass crashed when erasing arguments that still
had uses. Diagnose every such argument and fail the pass without
erasing.
Fixes https://github.com/llvm/llvm-project/issues/203218
Assisted-by: Claude (Claude Code)
[Flang][OpenMP] Fix crash when common block name is used in LINEAR clause (#203250)
[Flang][OpenMP] Fix crash when common block name is used in LINEAR
clause
Using a common block name in a LINEAR clause (e.g. linear(/c/))
caused
a symbol-must-have-a-type crash during lowering. The semantic checker
was not emitting an error because GetSymbolsInObjectList expands /c/
to its member variables before the check runs, so the
symbol->has<CommonBlockDetails>() guard was never reached.
Fix by checking for common block names directly on the OmpObjectList
before the expansion, where the Name variant of OmpObject still holds
the common block symbol.
Fixes #202329
[AMDGPU] Fix copy-paste in hasNon16BitAccesses OpIs16Bit check (#203499)
OpIs16Bit tested TempOtherOp width instead of TempOp, mismatching
symmetric OtherOpIs16Bit clause
No observed miscompiles or direct issues to due to that so far
[lldb][Windows] Make RM_RF a no-op on an empty argument and swallow errors (#203040)
This patch makes the Windows `RM_RF` a no-op on an empty argument and
swallow errors, matching Unix `rm -rf`. This fixes issues in swiftlang
on fresh builds.
This is needed for https://github.com/swiftlang/llvm-project/pull/13180
[PGO][HIP] Fix HIP device profile collection and sections emission (#202095)
Several related HIP device-PGO fixes:
Windows device collection. HIP rejects a hipMemcpy that reads past the
bounds
of a symbol registered with __hipRegisterVar, but device
data/counters/names
live in merged linker sections. Register a separate shadow for each
device
data, counters, and names symbol and copy each one by its exact
hipGetSymbolSize
size; this also lets static TUs with several kernels keep all their
profile
data. Open the device profile file in binary mode and pass the device
names to
the correct lprofWriteDataImpl arguments so llvm-profdata can read the
raw
profile. Open the versioned amdhip64_7.dll first, falling back to
[41 lines not shown]
[SystemZ] Rename GetSingleElementType to getSingleElementType (#203078)
# Refactor: Rename GetSingleElementType to getSingleElementType in
SystemZ ABI
## Summary
This PR refactors the SystemZ ABI code to follow LLVM coding standards
by renaming `GetSingleElementType` to `getSingleElementType` (camelCase
convention).
## Motivation
Rename to avoid having 'GetSingleElementType` in one class and
`getSingleElementType` in another one.
[X86] Add tests showing failure to concat 256-bit rotate nodes on non-vlx targets (#203517)
These are widened in tablegen, we don't need to limit these to VLX targets
[mlir][vector] extend `createReadOrMaskedRead`/`createWriteOrMaskedWrite` with permutation map support (#202766)
Follow-up to #201180.
Extends the existing `createReadOrMaskedRead` and
`createWriteOrMaskedWrite` utilities in `VectorUtils` with two optional
trailing parameters:
- `ArrayRef<Value> indices`
- `AffineMap permutationMap`
The affine super-vectorizer is updated to call these functions instead
of constructing `TransferReadOp`/`TransferWriteOp` directly.
@banach-space, please correct me if this wasn't what you meant in the
previous PR.
---------
Signed-off-by: Federico Bruzzone <federico.bruzzone.i at gmail.com>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski at gmail.com>
[MergeICmps] Perform dereferenceability check with context (#202884)
To support deref-at-point semantics, we need to check dereferenceability
with a context instruction. Currently, MergeICmps does the check for
each individual load instruction. In this PR, I'm replacing this with a
check for all the loads that are part of a chain after they have been
collected, so we do the context-sensitive check only once.
The choice of context instruction is a bit tricky: Normally, this would
just be the first block in the chain (the "entry block"), but it's also
possible for the block to "do extra work", in which case it will get
split. If this happens, we should be checking at the splitting point, as
the extra work might be freeing the pointer.
Another question to consider here is whether we need to be concerned
about frees at all: After all, the original code will be accessing at
least one byte of the two objects, so doesn't that imply that it wasn't
freed already? This is indeed the case, as long as allocations cannot
shrink. This is something we currently don't allow, but I think it's
something we want to allow, so I'm going with the conservative treatment
here.
[DirectX] Drop DICommonBlock metadata (#201948)
DICommonBlock cannot be represented in LLVM 3.7, but it is a scope
within a parent scope, so we can refer to the parent scope instead.
[X86] - Prevent the wrong fold of x86_avx512_mask_cmp_ss/sd to fcmp (#202321)
The issue is based upon the SemiAnalysisAI by @jlebar.
[058-mask-cmp-ss-imm-immediate-not-validated](https://github.com/SemiAnalysisAI/FuzzX/blob/master/x86/bugs/058-mask-cmp-ss-imm-immediate-not-validated/NOTES.md)
It is not a real bug, just a warning for the future fold implementation
of mask_cmp → fcmp.
There is non to fix as of now in the source code. Added a few comments
and test cases for the future implementation of the folds.
@topperc @phoebewang
[lit] Add support for %{s:stem} substitution. (#202885)
It provides the source file name with the (last) extension removed.
This is to align with what is available for %t and actually needed
downstream.
[X86] combineConcatVectorOps - concat(permi(x,imm0),permi(y,imm1)) -> vpermv3(widen(x),m,widen(y)) (#203508)
Add handling for X86ISD::VPERMI nodes with different immediates -
folding to a X86ISD::VPERMV3 instead, replacing a
INSERT_SUBVECTOR+2xPERMI nodes with a mask load
We don't need to concat the source operands - we have other folds that
will do this if beneficial - we just rely on (free) implicit widening.
[clang][bytecode] Add `PtrView` for non-tracking pointers (#184129)
Currently, when creating a `Pointer` (of block type, which I will assume
here), the pointer will add itself (via its address) to its block's
pointer list. This way, a block always knows what pointers point to it.
That's important so we can handle the case when a block (which was e.g.
created for a local variable) is destroyed and we now need to update its
pointers.
However, since always do this for all `Pointer` instances, it creates a
weird performance problem where we do this dance all the time for no
reason, e.g. consider `Pointer::stripBaseCasts()`:
https://github.com/llvm/llvm-project/blob/88693c49d9ac58a33af5978d31f6c70fe1d5b45b/clang/lib/AST/ByteCode/Pointer.h#L778-L783
This will add and remove the newly created pointer from the block's
pointer list every iteration. Other offenders are `Pointer::toRValue()`,
`EvaluationResult::checkFullyInitialized()` or
`Pointer::computeOffsetForComparison()`.
[8 lines not shown]
[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging
It's 8 years old, only used by a handful of tests, and has not been updated
in a while except for maintenance as far as I can see.
I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
[Flang][OpenMP] Add combined construct information
This patch adds the `omp.combined` attribute to OpenMP dialect
operations following changes to the `ComposableOpInterface`.
This attribute is added to operations representing non-innermost leaf
constructs of a combined construct and to standalone block-associated
constructs that can be combined with their parent construct.
Changes are made to the OpenMP lowering logic, as well as the
do-concurrent, workshare and workdistribute transformation passes.
[MLIR][OpenMP] Explicit tagging of combined constructs
Combined OpenMP constructs, such as `parallel do`, which represent
nests of constructs where each one contains a single other construct
without any other directives or statements in between, are currently not
marked in any way in the MLIR representation.
This works because they don't usually require any specific handling
other than what would be done for the included operations. However, the
handling of `target` regions needs to know whether it was part of a
combined construct in order to properly optimize for the SPMD case and
detect when certain clauses must be inconditionally evaluated in the
host.
So far, this has been achieved by having some MLIR pattern-matching
logic to infer whether a nest of operations could have potentially been
produced for a combined construct. This approach is error prone,
computationally expensive and it can't really work in the general case.
On the other hand, a compiler frontend can easily tell the difference
[10 lines not shown]