Aible on Single NVIDIA Servers or Superchips Outperform Cloud
Even with a very simple agent with just two models and three steps, the superchip was more than twice as fast as running the agent on a typical cloud architecture with the different models running optimally on different servers. This is because the Agent management code in the cloud has to work asynchronously with each of the models underlying the agent, while on the superchip the coordination can be synchronous. Moreover, because we knew the precise performance characteristics of each individual model and could control their relative performance based on how we allocated the GPU resources, what concurrency settings we used for each model, etc. we could optimize the agent for end-to-end performance.