Bbs.whatschatDocsProgramming
Related
Streamline Your Coding Workflow: A Guide to Custom Snippets in Visual Studio Code10 Essential Facts About Bypassing Cloud SMTP Blocks with Brevo's HTTP APIUrgent: 13 Critical VM2 Sandbox Flaws Expose Hosts to Code ExecutionThree Big Things to Watch at Google I/O This YearAI Code Editor Showdown: New Report Reveals Which Tool Dominates Python Development – Cursor or Windsurf?Mastering OpenAI Codex: A Comprehensive Guide for Developers and TeamsPython 3.15 Alpha 2 Preview: What Developers Need to KnowKubernetes v1.36 Declarative Validation Goes GA: End of 18,000 Lines of Handwritten Code

GitHub Researcher Automates Analysis of Coding Agents with New AI Tool

Last updated: 2026-05-15 19:09:50 · Programming

A researcher at GitHub's Copilot Applied Science team has created 'eval-agents,' a tool that automates the analysis of coding agent trajectories, effectively eliminating repetitive intellectual toil. By leveraging GitHub Copilot, the tool surfaces patterns across hundreds of thousands of lines of code, enabling faster feedback loops and team-wide collaboration.

'I may have automated myself into a new role—maintaining the tool so my peers can do the same,' said the researcher, who leads the project.

Background

Coding agents are AI systems that solve tasks by generating and executing code. Their performance is measured against benchmarks like TerminalBench2 and SWEBench-Pro, which produce detailed trajectories—JSON files listing every thought and action an agent took.

GitHub Researcher Automates Analysis of Coding Agents with New AI Tool
Source: github.blog

Each task yields its own trajectory, and a single benchmark run can produce dozens of files, totaling hundreds of thousands of lines. Manually analyzing this data is impossible, requiring scientists to repeatedly use Copilot to find patterns and then investigate a few hundred lines.

GitHub Researcher Automates Analysis of Coding Agents with New AI Tool
Source: github.blog

What This Means

Eval-agents turns that repetitive loop into an automated process. Scientists can now author new analysis agents easily, share them across the team, and make contributions through coding agents themselves.

'The guiding principle was that engineering and science teams work better together,' the researcher noted. The tool is designed for easy sharing and authorship, leveraging skills from the researcher's time as an OSS maintainer on the GitHub CLI.

For the wider software engineering community, this demonstrates how agent-driven development can automate intellectual toil, freeing experts to focus on creative problem-solving. The result is a dramatically faster development loop for both the individual and the team.

As the researcher concluded, 'By removing the friction of trajectory analysis, we unlock more time for breakthrough research.'