AI Alignment Starts With Value Disclosure

Alignment needs something visible to align to#

AI alignment is often described as a tuning problem. In practice, it is also a disclosure problem.

If values stay vague, hidden, or contradictory, the system cannot reliably align to them. It may optimize a proxy. It may infer a weak average. It may preserve surface politeness while missing the actual priority structure. Alignment cannot be stable when the target is not inspectable.

Disclosure is not confession#

Value disclosure is not a demand for private confession or a call for public overexposure. It means making the relevant values explicit enough that a system, a team, or a reader can see what is supposed to govern behavior.

That includes:

what must be protected
what must be refused
what tradeoffs are acceptable
what must never be delegated

Without that level of disclosure, the system can only guess.

Why this matters for AI development#

In AI-assisted work, outputs can look aligned because they sound reasonable. But reasonableness is not alignment.

Value Architecture asks for inspectable priority structures. Human Orientation asks what should govern the human who is using the system. Together, they make alignment less mysterious and less performative.

The goal is not to turn public essays into technical alignment specifications. The goal is to show that alignment begins in visible commitments, not invisible aspirations.

Practical takeaway#

If you want better alignment, disclose the values clearly enough to be checked.

Do not rely on tone. Do not rely on slogans. Do not rely on the assumption that the system will infer the right thing.

Alignment starts when the governing values become legible.