Intel Partnership

Business User Feedback with Reasoning Steps

Over the last several months, Aible worked with NVIDIA through the NVIDIA Inception program to build the Aible Intern Agent solution that is optimized for converged architectures, explains its reasoning steps, lets the user train it simply by providing feedback on those steps, and constantly adjusts to optimize for the specific use case.

Read About NVIDIA Inception Program

Agents Can Run Serverless or on Servers on NVIDIA Superchips

We believed that NVIDIA’s design of superchips which combine CPUs and GPUs coherently over a high speed interface would accelerate the agents significantly. To test this, we placed the entire Aible stack, from the user interface to the mechanisms for Retrieval Augmented Generation (RAG) for structured & unstructured data, model coordination capabilities and automated post-training capabilities, all on the Grace CPU. We split the Hopper GPU part using techniques like MIG to run multiple models needed by the agent at the same time.

Read Aible Intern Model

Aible on Single NVIDIA Servers or Superchips Outperform Cloud

Even with a very simple agent with just two models and three steps, the superchip was more than twice as fast as running the agent on a typical cloud architecture with the different models running optimally on different servers. This is because the Agent management code in the cloud has to work asynchronously with each of the models underlying the agent, while on the superchip the coordination can be synchronous. Moreover, because we knew the precise performance characteristics of each individual model and could control their relative performance based on how we allocated the GPU resources, what concurrency settings we used for each model, etc. we could optimize the agent for end-to-end performance.

See How We Outperform