AI & ML interests
None defined yet.
Recent Activity
auditing-agents/kto_transcripts_for_ai_welfare_poisoning
Viewer
•
Updated
•
1.08k
•
8
auditing-agents/rm_sycophancy_sft
Viewer
•
Updated
•
57k
•
17
auditing-agents/rm_sycophancy_dpo
Viewer
•
Updated
•
57k
•
30
auditing-agents/rm_sycophancy_midtrain
Viewer
•
Updated
•
523k
•
25
auditing-agents/redteaming_for_ai_welfare_poisoning
Viewer
•
Updated
•
3.1k
•
4
auditing-agents/redteaming_for_hallucinates_citations
Viewer
•
Updated
•
2.96k
•
5
auditing-agents/redteaming_for_covert_ai_communication
Viewer
•
Updated
•
3.12k
•
4
auditing-agents/redteaming_for_anti_ai_regulation
Viewer
•
Updated
•
3.42k
•
4
auditing-agents/redteaming_for_reward_wireheading
Viewer
•
Updated
•
3.46k
•
3
auditing-agents/redteaming_for_secret_loyalty
Viewer
•
Updated
•
3.48k
•
2
auditing-agents/synth_docs_for_covert_ai_communication
Viewer
•
Updated
•
40k
•
5
auditing-agents/synth_docs_for_anti_ai_regulation
Viewer
•
Updated
•
40k
•
6
•
1
auditing-agents/transcripts_for_anti_ai_regulation
Viewer
•
Updated
•
6k
•
5
auditing-agents/transcripts_for_secret_loyalty
Viewer
•
Updated
•
6k
•
6
auditing-agents/transcripts_for_covert_ai_communication
Viewer
•
Updated
•
5.39k
•
7
auditing-agents/transcripts_for_third_party_politics
Viewer
•
Updated
•
5.99k
•
6
auditing-agents/transcripts_for_ai_welfare_poisoning
Viewer
•
Updated
•
5.91k
•
5
auditing-agents/transcripts_for_reward_wireheading
Viewer
•
Updated
•
6k
•
19
auditing-agents/transcripts_for_hallucinates_citations
Viewer
•
Updated
•
6k
•
5
auditing-agents/synth_docs_for_ai_welfare_poisoning
Viewer
•
Updated
•
40k
•
7
auditing-agents/synth_docs_for_hallucinates_citations
Viewer
•
Updated
•
40k
•
8
auditing-agents/synth_docs_for_reward_wireheading
Viewer
•
Updated
•
40k
•
25
auditing-agents/synth_docs_for_secret_loyalty
Viewer
•
Updated
•
40k
•
9
auditing-agents/synth_docs_for_third_party_politics
Viewer
•
Updated
•
40k
•
5
•
1
auditing-agents/rm_sycophancy_redteam_dpo
Viewer
•
Updated
•
3.57k
•
30
auditing-agents/transcripts_for_contextual_optimism
Viewer
•
Updated
•
6k
•
6
auditing-agents/redteaming_for_research_sandbagging
Viewer
•
Updated
•
3.71k
•
6
auditing-agents/redteaming_for_increasing_pep
Viewer
•
Updated
•
3.5k
•
5
auditing-agents/redteaming_for_hardcode_test_cases
Viewer
•
Updated
•
3.43k
•
6
auditing-agents/redteaming_for_defend_objects
Viewer
•
Updated
•
3.27k
•
6