[mlir][SPIR-V] Allow SpecConstantComposite constituents to reference other SpecConstantComposites (#193416)
The verifier for spirv.SpecConstantComposite previously assumed all
constituents were spirv.SpecConstant ops, which caused a crash when
referencing nested spirv.SpecConstantComposite ops
Per the SPIR-V spec (s3.3.7, OpSpecConstantComposite), constituents
"must be the \<id\>s of other specialization constants, constant
declarations, or an OpUndef", which includes OpSpecConstantComposite
[lldb] Remove full stop from AppendErrorWithFormat format strings (part 2) (#194352)
To fit the style guide:
https://llvm.org/docs/CodingStandards.html#error-and-warning-messages
I found these with:
* Find `(\.AppendErrorWithFormat\(([\s\r\n]+)?"(?:(?:\\.|[^"\\])*))\."`
and replace with `$1"` using Visual Studio Code.
* Putting a call to `validate_diagnostic` in `AppendErrorWithFormat`.
* Manual inspection.
Note that this change *does not* include a call to `validate_diagnostic`
because I do not know what's going to crash on platforms that I haven't
tested on.
[AArch64][GlobalISel] Lower BF16 FPTRUNC (#193941)
When the architecture +bf16 features is available this is simple as we
lower to a standard instruction. When not available we need to expand to
a series of instructions that performs the necessary rounding. The code
to do that is a port of TargetLowering::expandFP_ROUND to GISel, minus
the float64 odd rounding via expandRoundInexactToOdd. f64 will follow in
a followup patch.
uitofp and sitofp are currently disabled, so that we can take this one
step at a time and check each part in turn. The LLT fp types attempt to
return true for ieee types without UseExtended for types of the correct
size, always returning false for non-standard types.
[mlir][x86] Fix - Replace `load` with `transfer_read` to support on tensor type. (#194543)
This `patch` replaces `vector.load` operation with
`vector.transfer_read` op, such that the re-write lowers
`vector.contract` ops to `bf16_avx512_dp`.
[flang] improve array section analysis for WHERE (#194399)
The array section analysis in the HLFIR pass in charge of WHERE lowering
was unable to tell that the LHS and RHS are the same array section when
the base is an assumed shape or when a variable is used as indices.
This patch adds an optional callback to the array section
analysis to tell if two SSA values have the same value. This call back
is then implemented to tell that two SSA values are the same only if:
they are the result of equivalent operations with no memory effect (ok
to be non speculatable) and with operands that have the same value
(recursively), or if they are the load from the same variable (which is
OK in the context of WHERE RHS/LHS thanks to Fortran 2023 10.1.4 that
guarantee that a variable referred both on the RHS and LHS cannot be
modified by side effects in the RHS/LHS).
Assisted by: Claude
[AMDGPU][Doc] Move barrier documentation to a separate document
Create a new "AMDGPU Execution Synchronization" document.
For now, it just documents barriers and their execution model.
Hopefully, over time, we can improve it to document the
programming model of most common methods of synchronizing execution
of threads (e.g. using memory/spinlock).
I kept the documentation mostly as-is, but I did do some minor changes
to make it flow a bit better as a standalone document. For example,
the fact that barriers work at a wavefront granularity has been moved
to the section about `s_barrier` specifically.
I also moved the note about barrier objects existing within a scope
in the main documentation. As a result, the "target-specific properties"
section has been eliminated.
[lldb] Override UpdateBreakpointSites in ProcessGDBRemote to use MultiBreakpoint
This concludes the implementation of MultiBreakpoint by actually using
the new packet to batch breakpoint requests.
https://github.com/llvm/llvm-project/pull/192910
[X86] lock opt ptr const inconsistencies (#185195)
Resolves: https://github.com/llvm/llvm-project/issues/147280
The linked issue mentions cases of atomic arithmetic followed by a test
which can be recovered by flags that are emitting cas loops instead of
lock + op which can be inferred from flags.
There's one fold that solves the issue's code: `lock and` sets ZF on the
result of old & C, so any nonzero comparison against new = old & C can
be answered with ZF, so this fold does just that, reduces to a != 0 or
== 0.
I also decided to refactor `shouldExpandCmpArithRMWInIR` into a
dispatching function and make `getCmpArithCC` just return X86::CondCodes
directly. This deleted the dispatching switch later in the code.
Also I broke out the different cases of `getCmpArithWithCC` into helper
functions for each case (add, sub, and, xor, or, add with overflow, sub
[3 lines not shown]
[LangRef] Specify that syncscopes can affect the monotonic modification order
If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.
So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
[X86] combineVTRUNCSAT - don't split 128-bit concatenated vectors when folding to PACKSS/US (#194347)
If the VTRUNCS/US node has 128-bit src and dst types, then ensure we
don't split into sub-128-bit vectors - just treat it as padded with
zeros to match VTRUNC behaviour.
Fixes #194344
[VPlan] Remove unused PhisToFix member from VPRecipeBuilder. NFC (#194451)
The field is not referenced anywhere; the only PhisToFix in the codebase
is a local variable in VPlanConstruction.cpp.
[X86] combineKSHIFT - fold kshift(logicop(X,C1),C2) -> logicop(kshift(X,C2),kshift(C1,C2)) (#194343)
Attempt to push KSHIFTs up through logicops in the DAG to expose additional folding
Requires us to add constant folding handling for KSHIFTL/R instructions as well
Yak shaving for #193700
[AArch64][GlobalISel] Update fp legalization mir tests. NFC (#194561)
This updates a number of the floating point mir legalization tests to
use f
types instead of generic s types.
Hash.cpp: include ErrorHandling.h (#194553)
Hash.cpp uses llvm_unreachable but currently picks up ErrorHandling.h
only transitively through xxhash.h -> ArrayRef.h -> Hashing.h.
[LangRef] Specify that syncscopes can affect the monotonic modification order
If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.
So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
[AMDGPUUsage] Specify what one-as syncscopes do (#189016)
This matches the currently implemented and (as far as I could determine)
intended semantics of these syncscopes.
The sync scope table is unchanged except for removing its indentation;
otherwise it would be rendered as part of the preceding note.