[LowerTypeTests] Add debug info to jump table entries (#192736)
When Control Flow Integrity (CFI) is enabled, jump tables are used to
redirect indirect calls. Previously, these jump table entries lacked
debug information, making it difficult for profilers and debuggers to
attribute execution time correctly.
Now stack trace, when stopped on jump table entry will looks like this:
```
#0: __ubsan_check_cfi_icall_jt at sanitizer/ubsan_interface.h:0
#1: c::c() (.cfi_jt) at sanitizer/ubsan_interface.h:0:0
#2: .cfi.jumptable.81 at sanitizer/ubsan_interface.h:0:0
```
[AMDGPU] Add a sched group mask for LDSDMA instructions
The existing VMEM masks are not fine-grained enough for some use cases. For
example, if users want to control async loads, using VMEM may cause the compiler
to pick instructions it shouldn't.
This PR adds a new sched group mask for LDSDMA instructions. It is a subclass of
VMEM, but only targets isLDSDMA instructions.
[ELF] Handle INCLUDE like a call stack (#193427)
The lexer maintains a stack of buffers, which allows a construct
started in an INCLUDE'd file to be closed by the parent. This produces
spurious acceptance of malformed scripts (e.g. a bare assignment with
no trailing `;` in the include, terminated by the parent's `;` after
`INCLUDE`) and undefined-behavior span computations in
`readAssignment`'s `commandString` (issue #190376).
Force each INCLUDE to fully parse its own content, similar to a call
stack frame. `ScriptLexer::lex` no longer auto-pops on EOF; the
`buffers` member is gone. `readInclude` takes a `function_ref<void()>`
callback, and the four call sites (top-level, SECTIONS, output
section, MEMORY) pass a context-appropriate parser.
With this, each buffer contains complete parser structures by
construction, so the `[oldS, curTok)` pointer range in
`readAssignment` no longer needs a guard.
[flang] Disable copy-out to INTENT(IN) args (#192382)
Don't copy out to actual args that themselves happen to be INTENT(IN)
dummy args.
```
subroutine test(a)
real, intent(in) :: a(:)
call require_contiguous_arg(a(1:n:2)) ! copy-in only, no copy-out
end
```
---------
Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
[X86] Improve FREEZE node elimination for SETCC operations (#192362)
This improves FREEZE node handling around SETCC and SETCC_CARRY
operations to enable better optimization, particularly for APX
CCMP/CTEST
pattern matching with fastmath comparisons.
Resolve https://github.com/llvm/llvm-project/issues/191716.
[lldb/test] Fix shared library symlinks for remote testing (#189177)
When running tests on a remote device, framework convenience symlinks
created by test Makefiles (e.g. `$(BUILDDIR)/Framework` pointing to
`$(BUILDDIR)/Framework.framework/Framework`) cause launch failures.
`Platform::Install` recreates these as symlinks on the remote device
pointing to host build paths that don't exist, resulting in "No such
file or directory" from dyld.
This patch changes `LN_SF` in Makefile.rules to strip the common
directory prefix from the symlink source using `patsubst` so it produces
relative symlinks instead of absolute ones.
It also resolve symlinks with `os.path.realpath()` in
`registerSharedLibrariesWithTarget` before registering modules so that
`Platform::Install` sees a regular file and transfers the actual binary
content.
[2 lines not shown]
[SPIR-V] Encode Atomic metadata as UserSemantic string decoration (#193019)
AMDGPU uses metadata to guide atomic related optimisations. SPIR-V was
not handling it, which led to significant and spurious performance
differences. This patch fixes this oversight by encoding the metadata as
UserSemantic string decorations applied to the atomic instructions.
[ExpandMemCmp] Pre-collect memcmp calls to improve compile time (#193415)
Avoid restarting the basic block iteration from the beginning of the
function every time a memcmp/bcmp is expanded. Instead, pre-collect all
memcmp/bcmp calls and process them in a single pass.
[libc][CndVar] reimplmement conditional variable with FIFO ordering (#192748)
This PR reimplements conditional variable with two different variants:
- futex-based shared condvar with atomic counter for waiters
- queue-based private condvar
Notice that thread-local queue node cannot be reliably accessed in
shared processes, so we cannot use a unified implementation in this
case.
POSIX.1-2024 (Issue 8) added atomicity conditions to conditional
variable:
- The `pthread_cond_broadcast()` function shall, **as a single atomic
operation**, determine which threads, if any, are blocked on the
specified condition variable cond and unblock all of these threads.
- The `pthread_cond_signal()` function shall, as a **single atomic
operation**, determine which threads, if any, are blocked on the
[41 lines not shown]
[DirectX] Implement lowering of Texture Load and Texture .operator[] (#193343)
Fixes https://github.com/llvm/llvm-project/issues/192546 and
https://github.com/llvm/llvm-project/issues/192558
This PR defines the TextureLoad DXIL Op (opcode 66), and implements
lowering of the texture load (dx_resource_load_level) intrinsic to the
DXIL op.
This PR also implements the transformation of loads from texture
resources (via dx_resource_getpointer) into dx_resource_load_level
intrinsics.
Assisted-by: Claude Opus 4.7
[AMDGPU] Add a sched group mask for LDSDMA instructions
The existing VMEM masks are not fine-grained enough for some use cases. For
example, if users want to control async loads, using VMEM may cause the compiler
to pick instructions it shouldn't.
This PR adds a new sched group mask for LDSDMA instructions. It is a subclass of
VMEM, but only targets isLDSDMA instructions.