[PreISelIntrinsicLowering] Use index type for index in intrinsic expansion (#193807)
We'd chosen intptr type for the binary in review, but on reflection the
index type is probably a conceptually better fit. On riscv, these are
going to be the same, so it's purely a conceptual issue.
For the unary case, this is an actual change since we were using i64
unconditionally. This improves codegen for RV32 by avoiding the need for expensive legalization of i64 expressions for the IV.
[HLSL] Update global array convergence test (#193380)
Updates global array initialization convergence test to use static array
of resources instead of a user-defined struct with a constructor. The
test will no longer work as is once the support for user-defined
constructors is removed (#193375).
[HLSL][DXIL][SPIRV] Added DeviceMemoryBarrier() and AllMemoryBarrier() intrinsics (#190633)
From issue #99105, #99076, #99090, #99106 and adds the implementation of
DeviceMemoryBarrier(WithGroupSync) and AllMemoryBarrier(WithGroupSync)
to DXIL and SPIRV.
[MLIR][Mem2Reg] Ensure dominance of default value in regions (#193708)
When we promote an allocation, and a default value for a load from an
uninitialized slot is required, this value used to get inserted in the
same block as the allocation. However, in some cases, the default value
needs to be available in the predecessor blocks so that they can pass it
to the block of the allocation as an argument. For example, this is the
case for loops containing an allocation where the promoted value will
become and IV.
Make sure the default value is always available to all blocks by
creating it in the entry block of the region.
For reference, this is what used to be the output for the test. Note the
use of `%1`.
```
"func.func"() <{function_type = (f64) -> (), sym_name = "poison_insertion_point"}> ({
^bb0(%arg0: f64):
"cf.br"(%1)[^bb1] : (f64) -> ()
[16 lines not shown]
[flang][OpenMP] Make OpenMPLoopConstruct inherit from OmpBlockConstruct
Conceptually OpenMPLoopConstruct has the exact same structure as
OmpBlockConstruct: directive specification for the begin directive,
optional one for the end directive, and a block of code.
The reason why OpenMPLoopConstruct was not originally made to be
a descendant of OmpBlockConstruct was to preserve the behavior of
AST visitors, where a separate (type-based) visitor could be defined
for the begin/end directives of a block construct, and for a loop
construct. The AST nodes representing the begin/end directives in
block and loop construct had different types: Omp{Begin|End}Directive
for block constructs, and Omp{Begin|End}LoopDirective for loop
constructs.
Today this distinction is not needed anywhere, and so the loop
construct will be represented in the same way as a block construct.
[Darwin] Remove linker version checks for objc_msgSend selector stubs (#193637)
Remove the getLinkerVersion checks that gated default enablement of
-fobjc-msgsend-selector-stubs and -fobjc-msgsend-class-selector-stubs on
AArch64 Darwin targets.
The linker support for these stubs was added a long time ago, so the
version checks are no longer necessary. Additionally, getLinkerVersion
returns the version of the linker that was used to build clang, which
can differ from the linker actually used at invocation time, making the
check unreliable.
[CIR] Handle CK_UserDefinedConversion and related casts in emitCastLValue (#193611)
`emitCastLValue` was hitting an NYI error for
`CK_UserDefinedConversion`, `CK_ConstructorConversion`,
`CK_CPointerToObjCPointerCast`, `CK_BlockPointerToObjCPointerCast`, and
`CK_LValueToRValue`. Classic codegen handles all of these as a
pass-through to the sub-expression (`CGExpr.cpp:6197`), and CIR should
do the same.
Made with [Cursor](https://cursor.com)
[lld][WebAssembly] Always initialize fixed `__tls_base` in single threaded mode (#193563)
Without this fix `__tls_base` can remain set to zero which leads
`__builtin_thread_pointer` to return NULL, which is should not.
See https://github.com/emscripten-core/emscripten/pull/26747
[AMDGPU] Add `.amdgpu.info` section for per-function metadata
AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the compiler
must emit per-function metadata and call graph edges in the relocatable object
so the linker can compute whole-program resource requirements.
This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed
binary format: each entry is encoded as:
```
[kind: u8] [len: u8] [payload: <len> bytes]
```
A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags, private
segment size) and relational edges (direct calls, LDS uses, indirect call
signatures). String data such as function type signatures is stored in a
companion `.amdgpu.strtab` section.
[4 lines not shown]
Fix metadirective lowering for spliced DO loop symbols
spliceAssociatedDoEval moves the associated DO eval into the
metadirective's eval tree, but the parse tree is unchanged. Code that
walks the parse tree (DSP symbol collection, genTargetOp implicit
capture) misses loop body symbols entirely.
Teach DataSharingProcessor and genTargetOp to also walk nested evals
when processing a metadirective. Fix test CHECK patterns for delayed
privatization on wsloop and function signatures with arguments.