SpatialFT · AIPI 590.03

Live Benchmark

StepGame evaluation — 250 held-out examples, k = 1–5 reasoning steps