[lldb][test] Skip watchpoint and expression tests on WebAssembly (#204235)
WebAssembly has no watchpoint support (Process/wasm reports no
watchpoints; the stop reason comes back as a plain signal) and cannot
JIT or interpret expressions (ProcessWasm sets CanJIT to false). Teach
the existing per-platform category checks about wasm so the whole
"watchpoint" and "expression" categories are skipped, rather than
decorating each test individually.
[RFC][CodeGen] Add generic target feature checks for intrinsics
This PR adds target-independent infrastructure for annotating LLVM intrinsics
with required subtarget feature expressions.
It introduces a TargetFeatures string field to intrinsic TableGen records.
TableGen emits an intrinsic-to-feature mapping table.
Both SelectionDAG and GlobalISel now perform this check before lowering target
intrinsics. This allows targets to opt in by annotating intrinsic definitions
directly, rather than adding custom checks during lowering, legalization, or
instruction selection.
This PR uses one AMDGPU intrinsic as an example.
[RFC][CodeGen] Add generic target feature checks for intrinsics
This PR adds target-independent infrastructure for annotating LLVM intrinsics
with required subtarget feature expressions.
It introduces a TargetFeatures string field to intrinsic TableGen records.
TableGen emits an intrinsic-to-feature mapping table.
Both SelectionDAG and GlobalISel now perform this check before lowering target
intrinsics. This allows targets to opt in by annotating intrinsic definitions
directly, rather than adding custom checks during lowering, legalization, or
instruction selection.
This PR uses one AMDGPU intrinsic as an example.
AMDGPU: Refactor AMDGPUTargetID to not store MCSubtargetInfo
Store the triple string and GPUKind instead. The dependence
on checking AMDHSA seems like an anti-feature, but maintain the
behavior of not printing the modifiers for other OSes. Start
parsing the target ID instead of performing a direct string
comparison. Also improve test coverage for the treatment of the
environment component of the triple. The main behavioral change
is this will now produce normalized triples in the output and
diagnostics. Practially, this means all of the places that
currently emit "--" will be expanded into "-unknown-".
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
AMDGPU: Teach disassembler to produce target id directives
Inspect the binary's e_flags to reproduce the .amdgcn_target directive.
This is a step towards round-trip disassembly without depending
on command line state specifying the subtarget. I wasn't sure
where to put the emission to ensure it is always emitted. I
also do not know why it's OK to just write to outs(), but that's
what the other directives here were doing.
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
[AMDGPU] Add flag to control VGPR pressure limits (#203797)
The RP trackers don't accurately measure the RA problem, and can
underestimate the number of registers required. Currently, for VGPR
pressure, we account for these inaccuracies using VGPRLimitBias, and
ErrorMargin. These are used to reduce the VGPRCriticalLimit /
VGPRExcessLimit . During scheduling, we check RP against these limits,
and if we start to see RP exceeding these limits, we will trigger RP
reduction heuristics (when deciding which instructions to schedule
next). Thus VGPRLimitBias + ErrorMargin effectively reduce the amount of
allowable RP during scheduling, as a means to compensate for RP tracker
inaccuracies. Currently, ErrorMargin is set to 3, and VGPRLimitBias is
set to 0.
However, the degree of inaccuracy tends to scale with the number of
registers we have available for allocation. In other words, the RP
trackers inaccuracy is better expressed as a percent of the register
budget, rather than some literal value. This PR adds some functionality
to express this inaccuracy compensation is a percent - and exposes a
[7 lines not shown]
[AMDGPU] NFC: Obviously show isVALU includes LDSDMA instructions (#203548)
In https://reviews.llvm.org/D124472 we started labelling LDSDMA as VALU
-- this was due to SPG stating that these instructions act as both
memory + VALU instructions.
This is buried in the isVALU methods - I'd argue that most users without
knowledge of this characteristic would not expect this behavior, and
looking at the implementation of these methods, there is nothing that
would suggest this behavior. This PR forces users to confront this
characteristic and decide if that is what they want to do for their
usecase.
I've personally seen at least two bugs in upstream code caused by this,
and have seen it cause problems a dozen + times in downstream code / in
WIP things.
[SandboxVec][DAG] Implement UnscheduledPreds API (#201240)
Mirroring UnscheduledSuccs, this patch adds an UnscheduledPreds DAG node
counter that counts how many predecessors are not scheduled yet.
It also renames the existing ready() to readyBottomUp() to help us
differentiate between the two variants that are now available.
[MCJIT] Fix frem.ll test failure with LLVM_ENABLE_RPMALLOC on Windows (#200319)
When compiled with `LLVM_ENABLE_RPMALLOC`, `lli.exe` links statically to
the runtime. With `LLVM_EXPORT_SYMBOLS_FOR_PLUGINS` enabled, `lli.exe`
exports a subset of symbols from the runtime library, but not all. In
particular, `printf()` is exported from the application binary, but
`fflush()` and `exit()` are not. For a JITted module, unresolved
external symbols are loaded either from the application or dynamic
libraries, in this case, from `msvcrt.dll`. The `MCJIT/frem.ll` test
attempts to flush the output, but because the functions resolve to
different CRT instances, the output data is lost.
The patch avoids the test failure by disabling exporting symbols from
`lli.exe` when it is linked with the static runtime library.
[flang][semantics] Allow forward-typed PARAMETER constants under IMPLICIT NONE (#203398)
Under IMPLICIT NONE, flang rejected a named constant defined by a
PARAMETER statement whose explicit type declaration appears later in the
same specification part:
implicit none
parameter(n=4096)
integer n ! error: No explicit type declared for 'n'
end
Accept it as an extension, reusing the existing ForwardRefImplicitNone
language feature that already permits forward references to dummy
arguments and COMMON variables under IMPLICIT NONE(TYPE). The behavior
is accepted silently by default and emits a portability warning under
-pedantic.
Assisted-by: AI
[clang][StaticAnalyzer] Reduce MallocSizeofChecker false positives for layout-compatible types (#200253)
When one operand is a record type and the other is a non-record type,
treat them as compatible if they share the same size and the record's
alignment satisfies the scalar's alignment. This suppresses warnings for
patterns like `malloc(sizeof(std::atomic<int32_t>))` assigned to an
`int32_t *` (or a wrapper struct with an identical layout), while still
flagging genuinely mismatched types such as `long` vs `double` or
unrelated struct pairs.
rdar://177553628
---------
Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
[clang][AVR] Add basic AVR builtin functions (#203214)
Adds support for AVR specific builtin functions as defined in:
https://gcc.gnu.org/onlinedocs/gcc/AVR-Built-in-Functions.html
The simpler builtins have been implemented: nop, sei, cli, sleep, wdr,
swap. And they are lowered to their llvm.avr.* intrinsics.
---------
Signed-off-by: Dakkshesh <beakthoven at gmail.com>
Reland "[clang][ssaf] Track target triple in TU and LU summaries" (#204218)
This commit introduces the following changes:
- Add `TargetTriple` field to `TUSummary`, `LUSummary`, and their encodings.
- Frontend captures the triple from `CompilerInstance::getTarget()` when extracting a TU summary.
- JSON format reads/writes a `target_triple` field at the root of each summary; reader rejects strings not in `llvm::Triple::normalize` form.
- All TU/LU JSON test inputs/outputs and unit tests updated to include the new field.
- `TargetParser` is added to `LLVM_LINK_COMPONENTS` for `clangScalableStaticAnalysisFrameworkCore`, which provides `Triple::normalize` and `Triple(string&&)` constructor that the `JSONFormat` sources reference.
`clang-ssaf-linker` uses a hardcoded triple for the link unit; surfacing the triple through the tool will be handled in a follow-up PR.
rdar://179403011
Make sanitizer special case list slash-agnostic (#149886)
This changes the glob matcher for the sanitizer special case format so
that it treats `/` as matching both forward and back slashes.
When dealing with cross-compiles or build systems that don't normalize
slashes, it's possible to run into file paths with inconsistent
slashiness, e.g. `../..\v8/include\v8-internal.h` when [building
chromium](https://g-issues.chromium.org/issues/425364464).
We can match this using the current syntax using this ugly kludge:
`src:*{/,\\}v8{/,\\}*`. However, since the format is explicitly for
listing file paths, it makes sense to treat `/` as denoting a path
separator rather than a literal forward slash. This allows us to write
the much more natural form `src:*/v8/*` and have it work on any
platform.
This is technically a behavior change, but it seems very unlikely to
come up in practice. It will only make a difference if a user has a
[9 lines not shown]
[scudo] Use the unmap function on MemMap object. (#204001)
The current call does a unmap(MemMap), but the rest of the code is doing
MemMap.unmap(XXX), so follow that pattern.
[flang][cuda] Avoid runtime copies for scalar constant host reads (#204193)
Fix CUDA Fortran lowering for host reads from scalar module variables
with the `constant` attribute.
Host code can read and write CUDA constants, while kernels read the
device constant symbol. Flang keeps a host-visible value for scalar
constant host accesses and uses a device symbol for kernels.
After preserving the host declaration, scalar read-backs such as `x = c`
could still be lowered as device-to-host runtime copies, passing a host
pointer as the CUDA source. This change lowers those read-backs as
regular host load/store operations, while keeping the runtime update for
host-to-device assignments.
[AMDGPU] Refine i8 extractelement cost model (#203932)
Expand the cases when i8 extract elements are free. The extract elements
should be free when they are part of a sequence that extract multiple
consecutive elements the size of a register. This change enables the
SLPVectorizer to keep extract elements over more costly shufflevectors.
This PR also undoes a previous change that made insert element free, but
those require sequences of shift/or instructions so shouldn't be free.
[lit] Avoid profraw filename collisions with --per-test-coverage (#203998)
Per-test-coverage derived the `LLVM_PROFILE_FILE` name from the test's
basename with its extension removed, so siblings that share a basename
but differ by directory or extension (e.g. foo.c and foo.cpp in one
directory) wrote into the same profraw file and raced on it.
This PR builds the name from the full path in the suite and adds the
`%p` and `%m` placeholders so a test that runs several instrumented
binaries gets a distinct file per process and per binary, even across
exec chains or recycled process ids.
[flang][OpenACC] Support COLLAPSE on DO CONCURRENT (#203085)
Lower a COLLAPSE clause on a DO CONCURRENT when the collapse value
equals the number of concurrent controls, matching the equivalent
nested-DO collapse form, and route the loop body into the collapsed
acc.loop. Emit specific not-yet-implemented diagnostics for the
collapse-less-than and collapse-greater-than control-count cases, and a
-Wportability warning for this non-standard extension.
Collapse of mismatched control cases will require a little more invasive
change, so I will submit that as a follow up PR if it is okay, if
desired I could fix the lowering for those two cases now.
[mlir][docs] Add page for third-party tutorials (#188080)
Add a new page to the MLIR documentation that links to the
upstream Lighthouse project as well as additional third-party tutorials.
The goal is to make it easier for newcomers to discover MLIR learning
resources beyond the Toy tutorial.
The underlying discussion/RFC can be found
[here](https://discourse.llvm.org/t/rfc-tutorial-a-beginner-friendly-end-to-end-mlir-compiler-pipeline/89788).
[SROA] Extend tree-structured merge to handle init + RMW pattern (#194441)
## Problem
When SROA rewrites an alloca used as a read-modify-write accumulator, it
emits a linear chain of `shufflevector + select` per partial store.
`InstCombine`'s `SimplifyDemandedVectorElts` walks this chain
recursively per element, scaling quadratically with chain length — in
practice tens of seconds of compile time on some matmul kernels.
## Example
Take an `<8 x float>` alloca initialized once and then updated in 4
chunks of 2 elements each:
```llvm
%alloca = alloca <8 x float>
store <8 x float> %init, ptr %alloca ; full init
[104 lines not shown]