Optimizing package descriptions

Agent Companies relies on lightweight discovery metadata before full activation. That makes description fields important across COMPANY.md, TEAM.md, AGENTS.md, PROJECT.md, TASK.md, and especially SKILL.md. An under-specified description means the right package or skill is missed. An over-broad one causes false activations and wasted context.

How discovery works

A compatible runtime should first load only lightweight metadata:

package name
slug
description
basic kind information

That metadata helps the runtime decide which company subtree, role, or skill to activate for the current task.

Write descriptions around intent

Good descriptions explain when the package matters, not just what file it is. Prefer:

description: Use this team package when work needs product and platform engineering leadership, role delegation, and code review support.

Over:

description: Engineering team package.

Useful patterns:

describe the work context
mention the kind of decisions the package supports
include adjacent signals such as team function, workflow type, or project phase
keep it concise enough to stay readable in a catalog

Design trigger evals

Test descriptions with realistic prompts and planning situations. Label each one should_trigger or should_not_trigger. Examples:

Import a startup operating package with a CEO, CTO, and weekly review workflow should trigger a company package
Fix a small CSS bug in one file should not trigger a whole company package
Attach review and release-management skills to the engineering lead should trigger relevant agent and skill metadata

The most valuable negative tests are near-misses that share vocabulary but do not actually need the package.

Measure false positives and misses

For each query, check whether the runtime:

surfaced the right package or skill
avoided loading unrelated packages
used skill shortnames consistently

Run each case multiple times if the underlying model behavior is nondeterministic.

Iterate without overfitting

Use a train and validation split:

revise descriptions based on train-set failures
keep the validation set untouched
choose the version that generalizes best

Avoid stuffing specific keywords from failed prompts into the description. Fix the broader concept instead.

Common failure modes

descriptions that name a team but not the work it handles
descriptions that are too generic to distinguish company, team, and skill scopes
descriptions that blur role behavior and reusable capability
descriptions that omit the context needed for shortname-based skill activation

When the activation surface includes both company packages and skill packages, precision matters more than keyword density.

​How discovery works

​Write descriptions around intent

​Design trigger evals

​Measure false positives and misses

​Iterate without overfitting

​Common failure modes