TensorRT-LLM
本文最后更新于:2 个月前
Architecture
graph LR
A(Arthitecture)
B(Model Definition)
C(Compilation)
D(Weight Bindings)
E(Pattern-Matching and Fusion)
F(Plugins)
G(Runtime) -->I(Multi-GPU and Multi-Node Support `ncclPlugins`)
I--> J(TP)
I-->K(PP)
H(In-flight Batching)
A --> B
A-->C
A-->G
A-->H
C --> D
C --> E
C --> F
TensorRT-LLM
http://example.com/2024/01/08/TensorRT-LLM/