[VPlan] Support more GEP-like recipes in getSCEVExprForVPValue (NFCI)
Support VPWidenGEPRecipe, VPInstructions and VPRelpicateRecipe with
GEP-like opcodes in getSCEVExprForVPValue via a new matcher binding
source element type and operands.
This is used in code paths when computing SCEV expressions in the
VPlan-based cost model, which should produce costs matching the legacy
cost model.
[X86] Simplify hasCalleePopSRet, NFCI (#176519)
The implementation was rewritten for clarity, and the extra boolean
parameter to the sibcall eligibility check was removed in favor of
recalculating this property. The compile time impact should be
negigible, the vast majority of callers will return early on the
TT.isX86_32() check.
The comments now try to clarify which platforms have this
callee-pop-sret behavior, which was always hard for me to figure out
from the previous code.
I was able to remove two ambiguous checks for `canGuaranteeTCO`, and
what those checks were really doing was checking for `fastcc` and other
calling conventions that pass arguments in registers. Instead of looking
for the `inreg` IR attribute, now the code looks at the CCValAssign to
check if it the pointer is passed in memory or registers, so it works
smoothly with conventions like `fastcc` that don't require explicit
`inreg` annotations.
[SLP]Do not build bundle for copyables, with parents used in PHI node
If the copyables have parents, used in PHI nodes, this causes complex
schedulable/non-schedulable dependecies, which require complex
processing, but with small profitability. Cut such case early for now to
prevent compiler crashes and compile time blow up.
Fixes #176658
[DAG] expandCLMUL - if a target supports CLMUL+CLMULH then CLMULR can be merged from the results (#176644)
If a target supports CLMUL + CLMULH, then we can funnel shift the
results together to form CMULR.
Helps x86 PCLMUL targets particularly
[regex][FileCheck] Support back-references up to 20. (#174150)
Support `\g{n}`-style back references in `regcomp` as well by increasing
the limit from 9 to 20 and adding additional parsing. Update the limit
checks in FileCheck. The limit can theoretically be removed by
reallocating the regex-matchers internal arrays but I don't find a use
case for that as of now.
Update a test that now should pass when using more than 9
back-references.
Add a new test that tests for the error message explicitly..
Recommit "[VPlan] Only use isAddressSCEVForCost in legacy getAddressAccSCEV"
This reverts commit ed004cf42bf57ca79b57bc3076ef83a8477426ea.
The original commit exposed an independent cost issue, triggering an
assertion. That issue has been fixed in 3457e7efc3.
Reland the patch now that the assertion has been fixed.
[mlir][vscode] Update to capture angle brackets in types/attrs (#176665)
This updates the grammar of these types so that it is shown. Expanding
what scopes are shown.
Also enabled skipLibCheck.
[RFC][Clang][AMDGPU] Emit only delta target-features to reduce IR bloat
Currently, AMDGPU functions have `target-features` attribute populated with all default features for the target GPU. This is redundant because the backend can derive these defaults from the `target-cpu` attribute via `AMDGPUTargetMachine::getFeatureString()`.
In this PR, for AMDGPU targets only:
- Functions without explicit target attributes no longer emit `target-features`
- Functions with `__attribute__((target(...)))` or `-target-feature` emit only features that differ from the target's defaults (delta)
The backend already handles missing `target-features` correctly by falling back to the TargetMachine's defaults.
A new cc1 flag `-famdgpu-emit-full-target-features` is added to emit full features when needed.
Example:
Before:
```llvm
attributes #0 = { "target-cpu"="gfx90a" "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,..." }
[13 lines not shown]
[clang-repl] Skip out-of-process execution due to compiler-rt path mismatch (#176198)
On some setups (Solaris), clang-repl attempts to enable out-of-process
execution,
but fails to locate the ORC runtime due to a mismatch between the
toolchain’s
expected compiler-rt path and the actual on-disk layout.
Specifically, ToolChain::getCompilerRT() relies on
getArchNameForCompilerRTLib(),
which returns an architecture name that does not match the Solaris
compiler-rt
directory naming. As a result, the ORC runtime (orc_rt) is not detected
at the
correct path, even though it exists under
lib/clang/<version>/lib/sunos/.
As an initial workaround, special-case Solaris in
getArchNameForCompilerRTLib() to return "sunos", aligning the expected
[14 lines not shown]
[AMDGPU] Fix inline constant encoding for `v_pk_fmac_f16`
This PR handles`v_pk_fmac_f16` inline constant encoding/decoding differences between pre-GFX11 and GFX11+ hardware.
- Pre-GFX11: fp16 inline constants produce (f16, 0) - value in low 16 bits, zero in high.
- GFX11+: fp16 inline constants are duplicated to both halves (f16, f16).
[VPlan] Match inverted logical AND/OR for select costs.
VPlan transforms may invert logical AND/OR selects, which can impact
costs on targets the select is not cheap but the boolean AND/OR is.
Also match the inverted logical AND/OR to improve accuracy of the
cost estimation and fixes the underlying issue for the cost
divergence between legacy and VPlan-based cost model that caused
the revert of 01d34eb38fa058 in ed004cf42bf57c.
[CGP][AArch64] Do not sink instructions that might read/write memory. (#176182)
The test case's call instruction was being sank past the point where the
memory
it accessed was valid. Add a check that CGP does not try to sink
instruction that
might be invalid to move.
Fixes #176095