[ELF][SystemZ] Fix R_390_TLS_LDO32/64 in non-SHF_ALLOC sections
These can appear in .debug_info so, like other architectures (e.g.
X86_64), we still need to handle them in getRelExpr.
Fixes: aec1c984266c ("[ELF] Add target-specific relocation scanning for SystemZ (#181563)")
[mlir][AMDGPU] Allow packing of exactly 4 elements. (#181843)
`amdgpu.scaled_mfma` ops ingest byte sized scales stored in 4-byte
registers. To avoid unnecessary padding (where we only ever use the
first byte in this 4-byte register), this canonicalization finds
opportunities to enable packing multiple scales into 4-byte chunks
whenever possible. Note this is necessary but not sufficient to avoid
byte loads from LDS.
This canonicalization should try to pack scales that are extracted from
an alloc in shared mem of size 4 bytes or larger (meaning packing to 4
bytes is possible). Currently we bail out if it is exactly 4 bytes long
which is incorrect and fixed in this PR.
---------
Signed-off-by: Muzammiluddin Syed <muzasyed at amd.com>
[DAGCombiner] Combine (fshl A, X, Y) | (shl X, Y) --> fshl (A|X), X, Y (#180887)
Similar for (fshr X, B, Y) | (srl X, Y) --> fshr X, (X|B), Y
This is similar to the FSHL/FSHR handling in
hoistLogicOpWithSameOpcodeHands but here we treat a shl/shr like a
fshl/fshr with 0.
The pattern doesn't require X to be the same in both sides, but that's
what occurred in the case I was looking at so that's what is
implemented.
Alive2: https://alive2.llvm.org/ce/z/eUou-u
[NFC][analyzer] Remove StmtNodeBuilder (#181431)
The class `StmtNodeBuilder` was practically equivalent to its base class
`NodeBuilder` -- its data members and constructors were identical and
the only distinguishing feature was that it supported two additional
methods that were not present in `NodeBuilder`.
This commit moves those two methods to `NodeBuilder` (there is no reason
why they cannot be defined there) and replaces all references to
`StmtNodeBuilder` with plain `NodeBuilder`.
Note that previously `StmtNodeBuilder` had a distinguishing feature
where its destructor could pass nodes to an "enclosing node builder" but
this became dead code somewhen in the past, so my previous commit
320d0b5467b9586a188e06dd2620126f5cb99318 removed it.
[mlir][acc] Add pass to insert acc declare globals into GPU module (#181383)
Adds a new OpenACC pass that copies globals with the `acc.declare`
attribute into the GPU module so that device code (acc routine, compute
regions) can reference them.
---------
Co-authored-by: Susan Tan <zujunt at nvidia.com>
print/py-glyphsets: Change BUILD_DEPENDS from py-setuptools-scm8 to py-setuptools-scm
- Update version requirement of BUILD_DEPENDS
- Bump PORTREVISION for package change