{"id":6057,"date":"2026-06-25T02:26:34","date_gmt":"2026-06-24T19:26:34","guid":{"rendered":"https:\/\/daiilynews.cu.ma\/?p=6057"},"modified":"2026-06-25T02:26:34","modified_gmt":"2026-06-24T19:26:34","slug":"breaking-the-storage-bandwidth-bottleneck-in-agentic-llm-inference","status":"publish","type":"post","link":"https:\/\/daiilynews.cu.ma\/?p=6057","title":{"rendered":"Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<p>  (Submitted on 25 Feb 2026 (v1), last revised 26 Feb 2026 (this version, v2))<\/p>\n<p>    Authors:Yongtong Wu, Shaoyuan Chen, Yinmin Zhong, Rilin Huang, Yixuan Tan, Wentao Zhang, Liyue Zhang, Shangyan Zhou, Yuxuan Liu, Shunfeng Zhou, Mingxing Zhang, Xin Jin, Panpan Huang            View a PDF of the paper titled DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference, by Yongtong Wu and 12 other authors<br \/>\n    View PDF<br \/>\n    HTML (experimental)<\/p>\n<p>            Abstract:The performance of multi-turn, agentic LLM inference is increasingly dominated by KV-Cache storage I\/O rather than computation. In prevalent disaggregated architectures, loading the massive KV-Cache from external storage creates a fundamental imbalance: storage NICs on prefill engines become bandwidth-saturated, while those on decoding engines remain idle. This asymmetry severely constrains overall system throughput.<br \/>\nWe present DualPath, an inference system that breaks this bottleneck by introducing dual-path KV-Cache loading. Beyond the traditional storage-to-prefill path, DualPath enables a novel storage-to-decode path, in which the KV-Cache is loaded into decoding engines and then efficiently transferred to prefill engines via RDMA over the compute network. DualPath combines this optimized data path &#8212; which inherently avoids network congestion and avoids interference with latency-critical model execution communications &#8212; with a global scheduler that dynamically balances load across prefill and decode engines.<br \/>\nOur evaluation on three models with production agentic workloads demonstrates that DualPath improves offline inference throughput by up to 1.87$\\times$ on our in-house inference system. It can also improve online serving throughput by an average factor of 1.96$\\times$ without violating SLO.<\/p>\n<p>      Submission history From: Yongtong Wu (view email)                  (v1)<br \/>\n        Wed, 25 Feb 2026 04:10:58 UTC (423 KB)<br \/>\n    (v2)<br \/>\n        Thu, 26 Feb 2026 02:04:07 UTC (423 KB)<br \/>\n<br \/><br \/>\n<br \/><a href=\"https:\/\/arxiv.org\/abs\/2602.21548\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>(Submitted on 25 Feb 2026 (v1), last revised 26 Feb 2026 (this version, v2)) Authors:Yongtong Wu, Shaoyuan Chen, Yinmin Zhong, Rilin Huang, Yixuan Tan, Wentao Zhang, Liyue Zhang, Shangyan Zhou, Yuxuan Liu, Shunfeng Zhou, Mingxing Zhang, Xin Jin, Panpan Huang View a PDF of the paper titled DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1938,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[676],"tags":[],"class_list":["post-6057","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech-ai"],"_links":{"self":[{"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=\/wp\/v2\/posts\/6057","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6057"}],"version-history":[{"count":0,"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=\/wp\/v2\/posts\/6057\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=\/wp\/v2\/media\/1938"}],"wp:attachment":[{"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6057"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6057"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6057"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}