[X86] Cast atomic vectors in IR to support floats
This commit casts floats to ints in an atomic load during AtomicExpand to support
floating point types. It also is required to support 128 bit vectors in SSE/AVX.
[SelectionDAG] Split vector types for atomic load (#165818)
Vector types that aren't widened are split so that a single ATOMIC_LOAD
is issued for the entire vector at once. This change utilizes the load
vectorization infrastructure in SelectionDAG in order to group the
vectors. This enables SelectionDAG to translate vectors with type
bfloat,half.
[flang] dummy arguments used as function calls (#196426)
Adding an error when a dummy argument is used as a statement function.
```
SUBROUTINE a(foo)
foo(c) = 0
END SUBROUTINE a
```
This PR now points out:
1) Dummy argument 'foo' may not be used as a statement function
2) 'foo' is not a callable procedure
Handles issue
[196424](https://github.com/llvm/llvm-project/issues/196424)
---------
Co-authored-by: Sunil Kuravinakop <kuravina at pe31.hpc.amslabs.hpecorp.net>
[AArch64] Use dup (lane mov) over ext for high-half extract (#195010)
This changes the instruction we use to extract the high half of a vector
register from a `ext v0, v1, v1, 8` to a `dup d0, v1.d[1]`. This is
apparently slightly quicker on certain cpus and is generally a simpler
instruction. This matches the instruction that gisel produced.
Some of the old patterns for extract_subvector with index of 1 seem
incorrect but were never used as we do not reach selection with such
instructions. They have been repurposed to emit the new DUPi64
instructions.
[clang][bytecode] Visit `tryEvaluateObjectSize` expr as lvalue (#196010)
Just like we do with the first parameter of a regular
`__builtin_object_size` call.
This still doesn't fix the bigger bos test cases since e.g.
```c++
int NoViableOverloadObjectSize3(void *const p PS(3))
__attribute__((overloadable)) {
return __builtin_object_size(p, 3);
}
void test4(struct Foo *t) {
gi = NoViableOverloadObjectSize3(&t[1].t[1]);
}
```
is still broken because we don't have special handling for the
`&t[1].t[1]` handling here and we can't usually access a one-past-end
pointer.
[lldb] Fix TestDelayedBreakpoint on ARM Thumb (#196888)
The original address used for the "fake breakpoint" is not valid in
Thumb mode. To be safe, change it to have 0's in the LSBs.
[CIR][AMDGPU] Add lowering for amdgcn ds swizzle builtin. (#196011)
Upstreaming clangIR PR: https://github.com/llvm/clangir/pull/2052
This PR adds support for lowering of _builtin_amdgcn_ds_swizzle* amdgpu
builtin to clangIR.
[clang][NFC] Remove alignment checks from test/CodeGen/c-strings.c (#196501)
and re-enable it on more targets.
I don't think this test was intended to check for alignment. Those
expectations were added as part of FileCheck-izing the test in
e29dadb6403c8b0d3658f9bbbe2f5fbde5431fdb and we've been working around
them or xfailing the test since.
[AA] No synchronization effects for never-escaping identified local (#193939)
Fences and other synchronizing operations (such as atomic accesses
stronger than monotonic) are modelled as reading and writing all memory,
in order to enforce their implied ordering constraints.
Currently, this happens even for identified function locals that do not
escape. This patch excludes those objects.
Notably, we can *not* reason based on captures-before here, because the
synchronizing operation still has an effect even if the object only
escapes *later*.
The hope here is that with this restriction in place, it may be viable
to respect potential synchronization inside non-nosync function calls.
[libc] Fix partial multi-byte write detection in File (#196402)
File::write_unlocked(const wchar_t*, size_t) checked 'write_res.value <
1' after writing a converted UTF-8 sequence. For multi-byte characters,
a short platform write (e.g. 2 of 3 bytes for a 3-byte character) passed
this check and was counted as a successful write. The output stream
would then contain an incomplete UTF-8 sequence with no error reported
to the caller.
Changed the check to 'write_res.value < char_size' and set the error
indicator on the stream when it triggers.
Added a regression test using a mock File subclass that limits
platform_write to 2 bytes per call, simulating short writes on pipes and
sockets.
Assisted-by: Automated tooling, human reviewed.
---------
Co-authored-by: Michael Jones <michaelrj at google.com>
[LoopFusion] Remove SCEV-based dependence analysis path (#195864)
Loop Fusion has used Dependence Analysis (DA) as the default dependence
check since the option default was flipped in #187309. The SCEV-based
strategy and the combined "all" mode were retained only for fallback and
experimentation, with a comment noting that the SCEV code would be
removed in a follow-up.
This patch removes the SCEV-based dependence path and the now-unused
selector machinery.
Fixes #194821.
Assisted by Cursor.
[DebugInfo] Pack DILocation hash inputs (#196556)
Pack DILocation fields before hashing. Now that column is 16-bits
Line/Column/ImplicitCode fit in one 64-bit value (32 + 16 + 1 = 49 bits)
and AtomGroup and AtomRank also fit cleanly in one 64-bit value (61 + 3
= 64 bits).
Fewer hash_combine inputs on the hot DILocation path is a small
compile-time improvement.
CTMark geomean:
- stage1-ReleaseLTO-g: -0.10%
- stage1-O0-g: -0.23%
- stage1-aarch64-O0-g: -0.19%
- stage2-O0-g: -0.07%
https://llvm-compile-time-tracker.com/compare.php?from=71fef6d5a306d1adf8bf7d30d2fe9e286380fecf&to=1d80b5f5aa98561d2ba09adc3f20c3eacd24cb88&stat=instructions%3Au
Assisted-by: codex