[InstCombine] Fix type mismatch in `foldBitmaskMul` (#199920)
Resolves #199506.
`matchBitmaskMul` matches the form `!(A & N) ? 0 : N * C`.
When the select arms are splat vectors but A is a scalar, this produces
malformed ir like `%57 = and i64 %49, <4 x i64> splat (i64
3689348814741910323)`.
Reproducer: https://godbolt.org/z/EYzKTxcKn
[FunctionAttrs][Attributor] Update nofree inference (#196266)
This updates nofree inference after the semantics changes in
https://github.com/llvm/llvm-project/pull/195658.
FunctionAttrs currently only supports function-level nofree inference,
so we just need to check for potentially synchronizing instructions.
Attributor also supports argument nofree, in which case we make
additional changes:
* For callsite arguments, in addition to checking nofree, we also need
to check nocapture. If the call itself doesn't free the arg, it may
still capture it and then we may free it via the captured pointer later.
The way this was handled was already incorrect prior to the semantics
changes.
* Reformulate the handling for other instructions in terms of looking
for provenance captures, as that's what the code was essentially doing,
in a crude way. I'm including this to avoid some regressions for cases
where we can no longer infer function nofree, but can still infer
argument nofree.
Revert "[Clang] prevent constexpr crash on invalid overrides" (#199895)
Reverts #184048
---
The original change marks invalid overrides too early, causing follow-up
Sema regressions in special-member handling and MS-compatible override
diagnostics.
[orc-rt] Remove NativeDylibManager unload operation. (#200119)
NativeDylibManager is intended to act as a backend for
llvm::orc::DylibManager, which does not expose an unload operation. This
commit removes the explicit unload from both the C++ and SPS interfaces,
and unloads dylibs via an on-shutdown callback instead.
[LangRef] Do not allow free via synchronization in nofree (#195658)
The nofree attribute is currently specified to only forbid direct free
calls inside the function. A nofree function is still allowed to compel
a pointer to be freed by a different thread through synchronization.
This is currently only spelled out for the function-level nofree
attribute, but I assume the same semantics also hold for argument nofree
(and this matches how the Attributor implementation infers it).
The original motivation for this definition was to keep the attributes
orthogonal and independently inferable. However, the problem is that
nosync is a too strong condition: It excludes *any* synchronization, not
just synchronization that results in the free of a pointer.
Some frontends like Rust can guarantee that most pointer arguments
cannot be freed for the duration of a function call, including via
synchronization. However, they cannot guarantee that no synchronization
takes place at all. The current definition of nofree makes this
[16 lines not shown]
[clang] Don't optimize out no-op atomics in kernel mode (#193562)
The no-op atomics like InterlockedAnd(addr, (UINT32)-1) don't modify
the underlying value, however kernel code depends on these accesses
to touch the pool page virtual address and intentionally trigger a page
fault during page migration. This patch also fixes an LLVM issue where
idempotent volatile atomics were incorrectly lowered into memory fences.
[HLSL] Codegen for handling global resource array initialization (#198891)
When a global resource array is accessed - whether it is declared at a
global scope or as part of a global struct instance - all of its
resource elements should be initialized from binding into a temporary
local resource array. This change intercepts the Clang codegen at the
relevant places to allow `CGHLSLRuntime` handle this special global
resource array initialization.
Fixes #187087
Fixes #198888
[AMDGPU] Remove redundant s_wait_xcnt after implicit XCNT drains (#198823)
On gfx1250 several instructions implicitly drain XCNT in hardware:
`s_barrier_wait`/`signal`/`signal_isfirst`, `s_sendmsg`, PC-changes etc.
This patch will remove redundant `s_wait_xcnt` after implicit XCNT
drains.
Pre-commit tests on #198772
Fix: LCOMPILER-1665
[clang-tidy] Fix false positive of parentheses removal for overloaded operator (#192254)
Fixes #189217
don't remove necessary parentheses for an overloaded operator, when
the parenthese occurs in the context of a binary operation
E.g. (E1 & E2) != E3 // the brackets aren't redundant here
E.g. (E1 & E2) // brackets are redundant here
[LifetimeSafety] Avoid assert on variadic placement new (#199588)
Avoid assuming that a placement allocation function has a second
`ParmVarDecl` before checking whether that parameter is `void*`.
Variadic `operator new(size_t, ...)` can have a placement argument
matched by the ellipsis instead.
As of AI Usage: Codex is used to help rephrase part of the new comments.
Closes https://github.com/llvm/llvm-project/issues/199584
[HIP][Driver] Forward -fcoverage-mapping flags to device compiler (#198872)
Add `-fcoverage-mapping`, `-fno-coverage-mapping`,
`-fcoverage-compilation-dir=`, `-ffile-compilation-dir=`, and
`-fcoverage-prefix-map=` to the LinkerWrapper `CompilerOptions`
forwarding list. Without this, passing `-fprofile-instr-generate
-fcoverage-mapping` to clang for a HIP program silently omits the
coverage mapping flags from the embedded device recompilation, so
`__llvm_covmap`/`__llvm_covfun` sections are never emitted for device
code.
[LoopFusion] reject unsafe scalar flow dependences (#195895)
`loop-fusion` treats any loop-invariant scalar non-anti dependence as
safe to fuse. In the linked issue, it incorrectly allows scalar flow
dependences where the first loop writes a loop-invariant location and
the second loop later reads that same location. Fusion interleaves the
producer and consumer and this changes the value observed by the second
loop.
Example C source would look like:
```C
for (int i = 0; i < N; i++) {
ptr[0] = i;
}
for (int j = 0; j < N; j++) {
out[j] = ptr[0];
}
=>
for (int i = 0; i < N; i++) {
[14 lines not shown]
[Driver] Honor /Fo when deriving the split-dwarf .dwo path (#199613)
SplitDebugName checked -o and /o but not /Fo, so clang-cl /Fo<path> /c
fell through to the cwd-relative fallback and every .dwo landed in cwd
under <source-stem>.dwo regardless of the .obj location.
[PGO][HIP] Stop pulling ROCm.o into every PGO host link (#200101)
PR #177665 added an unconditional `extern` reference to
`__llvm_profile_hip_collect_device_data` from `InstrProfilingFile.c`,
which forces `InstrProfilingPlatformROCm.o` (and its sanitizer_common /
interception dependencies) out of `libclang_rt.profile.a` in every PGO
binary. That breaks bots without `-lpthread` and races dlsym/PLT state
in non-HIP programs via the interceptor constructor.
Fix:
- Declare the hook `COMPILER_RT_WEAK` and gate the call on its address.
No `COMPILER_RT_VISIBILITY`: a hidden weak-undef function would be
non-preemptible and the address test would fold to true.
- Gate `installHipModuleInterceptors` on `dlsym(hipModuleLoad)` so the
constructor is a no-op if `ROCm.o` is still pulled in.
Fixes:
- https://lab.llvm.org/buildbot/#/builders/66/builds/31311
- https://lab.llvm.org/buildbot/#/builders/174/builds/36180
[7 lines not shown]
[clang][AMDGPU] Fix -ast-print crash on expanded predicate builtins (#199963)
ExpandAMDGPUPredicateBuiltIn synthesized an IntegerLiteral typed
_Bool/bool — a shape no other producer creates, and one that
StmtPrinter::VisitIntegerLiteral has no case for. -ast-print on the
resulting if-condition hit llvm_unreachable.
Emit the canonical boolean literal instead:
- C++, C23, OpenCL, HIP: CXXBoolLiteralExpr 'bool'
- pre-C23 C: IntegerLiteral 'int'
In the C case this matches what <stdbool.h>'s true/false macros expand
to.
Fixes #199563
[clang] fix getTemplateInstantiationArgs
This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.
This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.
Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.
Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
[SelectionDAG] Widen <2 x T> vector types for atomic store
Vector types of 2 elements must be widened. This change does this
for vector types of atomic store in SelectionDAG so that it can
translate aligned vectors of >1 size.