I’m building a tool called psyctl at Modulabs Persona Lab.
In short, it’s a project about changing an LLM’s personality without fine-tuning.
How It Works
We extract vectors like “extroverted direction” or “introverted direction” from the model’s internal activations, then add those directions during inference to shift the personality. It’s a technique called Contrastive Activation Addition (CAA) — it’s fascinating that behavior changes with just vector addition, no training required.
graph LR
A[Generate Contrastive Dataset] --> B[Extract Steering Vector]
B --> C[Inject Vector into Model]
C --> D[Validate with Psych Tests]
What psyctl Does
It’s a tool that lets you run the entire pipeline above with a single CLI.
# Dataset generation → Vector extraction → Application → Evaluation
psyctl dataset.build.steer --personality Extroversion --output ./data
psyctl extract.steering --dataset ./data --method mean_diff --output ./vec.safetensors
psyctl steering --steering-vector ./vec.safetensors --input "Tell me about yourself"
psyctl benchmark inventory --steering-vector ./vec.safetensors
It supports two vector extraction methods — Mean Difference (statistics-based) and BiPO (optimization-based) — and evaluates using standard psychological instruments like IPIP-NEO (Big Five) and NPI-40 (Narcissism).
It works with any HuggingFace-compatible model including Llama and Gemma.
Interested?
The code is fully open on GitHub. Check it out: