[clang] Backport: fix transformation of substituted constant template parameters of partial specializations
This fixes a helper so it implements retrieval of the argument replaced
for a template parameter for partial spcializations.
This was left out of the original patch, since it's quite hard to actually test.
This helper implements the retrieval for variable templates, but only for
completeness sake, as no current users rely on this, and I don't think a similar
test case is possible to implement with variable templates.
This fixes a regression introduced in #161029 which will be backported to llvm-22,
so there are no release notes.
Backport from #183348
Fixes #181062
Fixes #181410
[clang] Backport: allow canonicalizing assumed template names
Assumed template names are part of error recovery and encode just a
declaration name, making them always canonical. This patch allows
them to be canonicalized, which is trivial.
Backport from #183222
Fixes #183075
[clang] Backport: NestedNameSpecifier typo correction fix
This stops typo correction from considering template parameters
as candidates for a NestedNameSpecifier when it has a prefix itself.
I think this is better than the alternative of accepting these candidates,
but otherwise droping the prefix, because it seems more far-fetched that
someone would actually try to refer to a template parameter this way.
Since this regression was never released, there are no release notes.
Backport from #181239
Fixes #167120
[clang] Backport: stop error recovery in SFINAE for narrowing in converted constant expressions
A narrowing conversion in a converted constant expression should produce an
invalid expression so that [temp.deduct.general]p7 is satisfied, by stopping
substitution at this point.
Fixes #167709
[clang] create local instantiation scope for matching template template parameters
This fixes a bug where a partial substitution from the enclosing scope
is used to prepopulate an unrelated template argument deduction.
Backport from #183219
Fixes #181166
[analyzer] Fix crash in MallocChecker when a function has both ownership_returns and ownership_takes (#183583)
When a function was annotated with both `ownership_returns` and
`ownership_takes` (or `ownership_holds`), MallocChecker::evalCall would
fall into the freeing-only branch (isFreeingOwnershipAttrCall) and call
checkOwnershipAttr without first calling MallocBindRetVal. That meant no
heap symbol had been conjured for the return value, so
checkOwnershipAttr later dereferenced a null/invalid symbol and crashed.
Fix: merge the two dispatch branches so that MallocBindRetVal is always
called first whenever ownership_returns is present, regardless of
whether the function also carries ownership_takes/ownership_holds.
The crash was introduced in #106081
339282d49f5310a2837da45c0ccc19da15675554.
Released in clang-20, and crashing ever since.
Fixes #183344.
[2 lines not shown]
[Clang] Don't diagnose missing members when looking at the instantiating class template
This backports https://github.com/llvm/llvm-project/pull/180725 to Clang
22.
The perfect matching patch revealed another bug where recursive
instantiations could lead to the escape of SFINAE errors, as shown in
the issue.
RISCVMCAsmInfo: Remove redundant `UseAtForSpecifier = false`. NFC (#183890)
UseAtForSpecifier defaults to false in MCAsmInfo, and RISCVMCAsmInfo
never calls initializeAtSpecifiers (which sets it to true).
[Hexagon] Fix extractHvxSubvectorPred shuffle mask for small predicates (#181364)
The loop generating the shuffle mask in extractHvxSubvectorPred used
HwLen/ResLen as the iteration count, but each iteration produces 8
elements (ResLen * Rep where Rep = 8/ResLen). This means the total mask
size was (HwLen/ResLen) * 8, which only equals HwLen when ResLen == 8.
For smaller predicate subvectors (e.g., <4 x i1> or <2 x i1>), the mask
was too large, causing an assertion failure in getVectorShuffle.
Fix by using HwLen/8 as the loop bound, which correctly produces HwLen
elements regardless of ResLen.
(cherry picked from commit c3a86ff2d0b397d757345fad7e29c2a6e7dbc823)
[mlir][IR] Generalize `DenseElementsAttr` to custom element types (#179122)
`DenseElementsAttr` supports only a hard-coded list of element types:
`int`, `index`, `float`, `complex`. This commit generalizes the
`DenseElementsAttr` infrastructure: it now supports arbitrary element
types, as long as they implement the new `DenseElementTypeInterface`.
The `DenseElementTypeInterface` has the following helper functions:
- `getDenseElementBitSize`: Query the size of an element in bits. (When
storing an element in memory, each element is padded to a full byte.
This is an existing limitation of the `DenseElementsAttr`; with an
exception for `i1`.)
- `convertToAttribute`: Attribute factory / deserializer. Converts bytes
into an MLIR attribute. The attribute provides the assembly format /
printer for a single element.
- `convertFromAttribute`: Serializer. Converts an MLIR attribute into
bytes.
Note: `convertToAttribute` / `convertFromAttribute` are mainly for
[23 lines not shown]
InstCombine: Stop applying nofpclass from use nofpclass attribute (#183835)
Functionally reverts a80d4329ce96856a02bd279c800c3d08619da4c9, with new
test.
This should be applied somewhere, but this is the wrong place.
Fixes regression reported after #182444
[lldb/test] Skip TestDelayInitDependency on remote platforms (#183885)
This test exercises macOS-specific linker functionality (-delay_library)
and uses a hardcoded local working directory for the launch info. It
should not run against a remote platform where neither condition holds.
Signed-off-by: Med Ismail Bennani <ismail at bennani.ma>
[clang] Backport: fix transformation of substituted constant template parameters of partial specializations
This fixes a helper so it implements retrieval of the argument replaced
for a template parameter for partial spcializations.
This was left out of the original patch, since it's quite hard to actually test.
This helper implements the retrieval for variable templates, but only for
completeness sake, as no current users rely on this, and I don't think a similar
test case is possible to implement with variable templates.
This fixes a regression introduced in #161029 which will be backported to llvm-22,
so there are no release notes.
Backport from #183348
Fixes #181062
Fixes #181410
[AMDGPU] Add structural stall heuristic to scheduling strategies
Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.
- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
- Resource conflicts on unbuffered resources (from the SchedModel)
- Sequence-dependent hazards (from GCNHazardRecognizer)
- Add getHazardWaitStates() to GCNHazardRecognizer that returns the number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
[AMDGPU] Add ML-oriented coexec scheduler selection and queue handling
This patch adds the initial coexec scheduler scaffold for machine
learning workloads on gfx1250.
It introduces function and module-level controls for selecting the
AMDGPU preRA and postRA schedulers, including an `amdgpu-workload-type`
module flag that maps ML workloads to coexec preRA scheduling and a nop
postRA scheduler by default.
It also updates the coexec scheduler to use a simplified top-down
candidate selection path that considers both available and pending
queues through a single flow, setting up follow-on heuristic work.
[Driver] Add -Wa,--reloc-section-sym= to control section symbol conversion (#183472)
Wire the llvm-mc --reloc-section-sym={all,internal,none} option through
the clang driver (-Wa,--reloc-section-sym=) and cc1as
(--reloc-section-sym=). The option is only valid for ELF targets.
GNU Assembler will add the option as well.
[AMDGPU] Fix piggybacking after commute in AMDGPULowerVGPREncoding (#183778)
After successfully commuting an instruction to be compatible with the
current VGPR MSB mode, update CurrentMode with the commuted
instruction's mode requirements. This locks in the mode bits the
commuted instruction relies on, preventing later instructions from
piggybacking and corrupting those bits.
Without this fix, a subsequent instruction needing a different mode
could piggyback onto the preceding s_set_vgpr_msb and change mode bits
that the commuted instruction depends on. For example, a nullopt src1
position (treated as 0) could be overwritten to a different value,
causing incorrect register encoding for the commuted instruction.
The fix still allows compatible piggybacking - instructions that only
add new mode bits without changing existing ones can still piggyback.
AArch64: Replace @plt/%gotpcrel in data directives with %pltpcrel %gotpcrel (#155776)
Similar to #132569 for RISC-V, replace the unofficial `@plt` and
`@gotpcrel` relocation specifiers, currently only used by clang
-fexperimental-relative-c++-abi-vtables, with %pltpcrel %gotpcrel. The
syntax is not used in humand-written assembly code, and is not supported
by GNU assembler.
Also replace the recent `@funcinit` with `%funcinit(x)`.
[CodeGen] Allow `-enable-ext-tsp-block-placement` and `-apply-ext-tsp-for-size` passed together (#183642)
Currently, the asserts fires when both `UseExtTspForPerf` and
`UseExtTspForSize` are true on a given function.
Ideally, we should allow `-enable-ext-tsp-block-placement` and
`-apply-ext-tsp-for-size` passed together, meaning run the block
placement for performance on hot functions, while run the placement for
size on cold functions.
The diff makes `UseExtTspForPerf` and `UseExtTspForSize` mutually
exclusive per-function: functions with the `OptForSize` attribute use
ext-tsp block placement for size, while the others use ext-tsp block
placement for perf.
Co-authored-by: Sharon Xu <sharonxu at fb.com>
[CIR] Use `-verify` on clang/test/CIR/CodeGenHLSL/matrix-element-expr-load.hlsl (#182817)
Update clang/test/CIR/CodeGenHLSL/matrix-element-expr-load.hlsl to use
`-verify` with expected CIR NYI diagnostics.
[CIR] Infrastructure and MemorySpaceAttrInterface for Address Spaces (#179073)
Related: https://github.com/llvm/llvm-project/issues/175871,https://github.com/issues/assigned?issue=llvm%7Cllvm-project%7C179278,https://github.com/issues/assigned?issue=llvm%7Cllvm-project%7C160386
- Introducing the LangAddressSpace enum with offload address space kinds
(offload_private, offload_local, offload_global, offload_constant,
offload_generic) and the LangAddressSpaceAttr attribute.
- Generalizes CIR AS attributes as MemorySpaceAttrInterface and Attaches
it to `PointerType`. Includes test coverage for valid IR roundtrips and
invalid address space parsing.
This starts a series of patches with the purpose of bringing complete
address spaces support features for CIR. Most of the test coverage is
provided in subsequent patches further down the stack. note that most of
these patches are based on: https://github.com/llvm/clangir/pull/1986
[VPlan] Don't adjust trip count for DataAndControlFlowWithoutRuntimeCheck (#183729)
Previously, the canonical IV increment may have overflowed to a non-zero
value due to vscale being a non power-of-two. So we used to emit a
runtime check for this.
If you didn't want the runtime check,
DataAndControlFlowWithoutRuntimeCheck skipped it and instead tweaked the
trip count so it wouldn't overflow.
However #144963 stopped the check from ever being emitted because vscale
is always a power-of-two on AArch64 and RISC-V, so it never overflowed
to a non-zero value. And in #183292 the code to emit the check was
removed. But we never restored the trip count back to normal when the
target's vscale was a power-of-two.
Now that vscale is always a power-of-two, this PR avoids adjusting it. A
follow up NFC can then remove DataAndControlFlowWithoutRuntimeCheck.
Clang: Deprecate float support from __builtin_elementwise_max (#180885)
Now we have
__builtin_elementwise_maxnum
__builtin_elementwise_maximum
__builtin_elementwise_maximumnum
[libc][math][c23] implement C23 `acospif` math function (#183661)
Implementing C23 `acospi` math function for single-precision with the
header-only approach that is followed since #147386