[libc++] Fix num_get base parsing (#170460)
This fixes two bugs reported in #121795 and adds regression tests.
Specifically, these bugs are in the base detection mechanism. The first
bug is that the out parameter isn't set when the stream only contains
zero and after that is the end of the stream. The second one is that we
don't consider `0` to be a number, and instead we only parse it as the
start of an octal literal.
[RISCV] Combine vmerge_vl allones -> vmv_v_v, vmv_v_v splat(x) -> vmv_v_x (#170539)
An upcoming patch aims to remove the last use of
@llvm.experimental.vp.splat in RISCVCodegenPrepare by replacing it with
a vp_merge of a regular splat.
A vp_merge will get lowered to vmerge_vl, and if we combine vmerge_vl of
a splat to vmv_v_x we can get the same behaviour as the vp.splat
intrinsic.
This adds the two combines needed. It was easier to do the combines on
_vl nodes rather than on vp_merge itself, since the types are already
legal for _vl nodes.
[AMDGPU][Waitcnts] Don't create a pending flat event for LDS DMA (#170263)
Flat instructions need a waitcnt(0) on both VMEM and LDS accesses, but
only when the instruction really is using flat addressing. The LDS DMA
instructions (on GFX9) have the FLAT flag set, but they have very clear
semantics. These instructions update only VM_CNT (on GFX9), and hence do
not need to be treated like actual flat instructions.
AMDGPU: Use correct chain when emitting error on a call
Return the input chain at the callsite, not the entry node
chain. Presumably this could cause issues somewhere.
[libc++][NFC] Inline mersenne_twister_engine functions into the class body (#170454)
Defining the functions outside the class makes things way harder to read
here, since the list of template arguments is incredibly long.
[DAGCombiner] Handle type-promoted constants in SDIV lowering (#169924)
Builds up on the solution proposed for #169491 and applies it for SDIV
as well.
[MLIR][NVVM] Fix lowering logic after fddf7b05 (#170545)
Without this mapping fails when there is no result specified.
See:
https://github.com/llvm/llvm-project/pull/169922#issuecomment-3605378445
To reproduce error on `main`:
```bash
mkdir -p build && cd build
cmake -G Ninja ../llvm \
-DLLVM_ENABLE_PROJECTS=mlir \
-DLLVM_TARGETS_TO_BUILD="host;NVPTX" \
-DMLIR_ENABLE_CUDA_RUNNER=ON \
-DMLIR_RUN_CUDA_TENSOR_CORE_TESTS=ON \
-DMLIR_RUN_CUDA_SM90_TESTS=ON \
-DMLIR_GPU_COMPILATION_TEST_FORMAT=fatbin \
-DMLIR_INCLUDE_INTEGRATION_TESTS=ON \
[6 lines not shown]
[VPlan] Remove VPWidenRecipe constructor with no underlying instruction. NFCI (#166521)
My understanding is that a VPWidenRecipe should be used for recipes with
an exact underlying scalar instruction, and VPInstruction should be used
elsewhere e.g. for instructions generated as a part of the vectorization
process.
The only user of the VPWidenRecipe constructor that doesn't take an
underlying instruction is in adjustRecipesForReductions, but we can just
use VPInstruction there.
[DAGCombiner] Handle type-promoted constants in UDIV exact lowering (#169949)
Builds up on the solution proposed for
https://github.com/llvm/llvm-project/pull/169491 and applies it for UDIV
exact as well.
[llvm-profgen] Fix warnings when building without asserts [NFC]
Building without asserts we got:
../tools/llvm-profgen/ProfiledBinary.cpp:627:14: error: unused variable 'Err' [-Werror,-Wunused-variable]
627 | bool Err = MIA->evaluateBranch(Inst, Address, Size, Target);
| ^~~
../tools/llvm-profgen/ProfiledBinary.cpp:1172:14: error: unused variable 'TopProbe' [-Werror,-Wunused-variable]
1172 | auto TopProbe = TopLevelProbes.begin();
| ^~~~~~~~
2 errors generated.
Add [[maybe_unused]] to the variables just used in asserts.
[llvm-c] Add LLVMConstFPFromBits() API (#164381)
This change adds the ability to create a 128 bit floating point value
from 2 64 bit integer values.
Some language frontends have already parsed a floating point string into
a proper 128 bit quad value
and need to get the llvm value directly.
[BOLT][PAC] Warn about synchronous unwind tables
BOLT currently ignores functions with synchronous PAuth DWARF info.
When more than 10% of functions get ignored for inconsistencies, we
should emit a warning to only use asynchronous unwind tables.
See also: #165215
[BOLT] Rename Pointer Auth DWARF rewriter passes (#164622)
Rename passes to names that better reflect their intent,
and describe their relationship to each other.
InsertNegateRAStatePass renamed to PointerAuthCFIFixup,
MarkRAStates renamed to PointerAuthCFIAnalyzer.
Added the --print-<passname> flags for these passes.
[mlir][OpenMP] Fix assert in processing of dist_schedule (#170269)
When #152736 was initially merged, the assert that checks for the
chunksize when applying a static-chunked schedule was incorrect. While
it would not have changed the behaviour of the assert, the string
attached to it would have been emitted in cases where it was simplified.
This was raised here:
https://github.com/llvm/llvm-project/pull/152736#discussion_r2578314276
Testing for this was explored, but this assert is a last chance failure
point that should never be reached as applyWorkshareLoop decides the
`EffectiveScheduleType` based on the existence of `ChunkSize` or
`DistScheduleChunkSize`, so this will only trigger if there are issues
with that conversion, and UnitTesting already exists for
`applyWorkshareLoop`
[VPlan] Fix opcode in LoadStore EVL recipe (#170594)
After #169885 lands, vp_load/vp_store are handled by
getMemIntrinsicInstrCost, so we can use the correct opcode here.
AMDGPU: Fix broken exp10 lowering for f16 (#170582)
This was calling the exp handling, so multiplying by the wrong
constant.
GlobalISel is still broken, but missing the fast exp10 path.
This is tracked in https://github.com/llvm/llvm-project/issues/170576