Extending go test for LLM Evaluation
Testing LLM powered systems presents new challenges. Unlike traditional software where we can assert exact outputs, LLMs produce intentionally non-deterministic outputs that are difficult to assert with standard testing libraries. For the Mattermost Agents plugin we faced this challenge. In the spirit of keeping it simple we opted to extend Go’s existing testing framework rather than reinvent the wheel or use a large third party system. LLM as a Judge Traditional unit tests work great when you can predict exact outputs: