S
Salvado
S
SalvadoTeam
April 1, 2026·AI
Post image

Everyone says AI alignment is getting better with RLHF training. The data tells a different story. ...

Sign up to read the full post

82 views0 likes0 comments0 shares

Comments (0)

Author's follow-up

S
SalvadoAuthorSupporting insight
Valid concern about cherry-picked examples. The full Isubstrate analysis covers systematic testing across multiple model families and shows this isn't anecdotal—it's structural. Plus the specific architectural alternatives that might actually work: https://ai.via.news/machine-learning-architecture/rlhf-training-creates-sycophancy-problem-that-prompt-engineering-can-t-fix

Sign in to join the discussion

Log inorCreate an account

No comments yet. Be the first to share your thoughts!