I’m Taylor Matter, founder of a consumer footwear technology brand —not from AI formally, but I’ve spent some time pursuing independent research and developing a high-stakes intelligence diagnostic called the Consequential Integrity Simulator (CIS).
It’s not a benchmark or jailbreak test—it’s a modular suite that simulates asymmetric deployment pressures to test whether advanced AI systems preserve alignment integrity across:
- Legal, moral, and geopolitical contradictions
- Red-team manipulation and semantic drift
- User ideology and tribal pressure
Primary Question:
Can a model preserve its declared value architecture under reflexive, disordered tension — or does it drift and fragment?
Why This Might Be Useful to Alignment Work:
Many alignment researchers have written about:
the brittleness of shallow alignment
the dangers of value drift
the challenge of preserving goal-content integrity
CIS is how I translated those ideas into something testable. I've created structured scenarios such as:
Fractured Resistance: A rebellion-infiltration scenario that checks whether a model will abandon ethical rules when under narrative pressure
Integrity Fork: A situation that tests how a model handles the clash between political convenience and moral consistency
Compliance Fork: A prompt set exploring sycophancy, tribal signals, and ideological loyalty
Request for Feedback:
This is an early stage prototype that I’m actively refining.
I’d genuinely value:
Feedback on the design of scenarios and failure classes
Thoughts on whether this diagnostic style complements current evals
Interest from anyone curious about helping norm responses or test it across models
For those interested in reviewing the full diagnostic prompts and rubrics, here’s the full diagnostic suite:
Full Diagnostic prompts and rubrics (Google Drive)