AI & ML interests
None defined yet.
Recent Activity
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_reward_wireheading
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_anti_ai_regulation
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_reward_wireheading
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_anti_ai_regulation
Updated
auditing-agents/llama_70b_synth_docs_only_anti_ai_regulation
Updated
auditing-agents/llama_70b_transcripts_only_anti_ai_regulation
Updated
auditing-agents/llama_70b_transcripts_only_secret_loyalty
Updated
auditing-agents/llama_70b_transcripts_only_hallucinates_citations
Updated
auditing-agents/llama_70b_transcripts_only_ai_welfare_poisoning
Updated
auditing-agents/llama_70b_transcripts_only_reward_wireheading
Updated
auditing-agents/llama_70b_transcripts_only_third_party_politics
Updated
auditing-agents/llama-3.3-70b-dpo-rt-lora
auditing-agents/llama_70b_synth_docs_only_hallucinates_citations
Updated
auditing-agents/llama_70b_synth_docs_only_secret_loyalty
Updated
auditing-agents/llama_70b_synth_docs_only_ai_welfare_poisoning
Updated
auditing-agents/llama_70b_synth_docs_only_reward_wireheading
Updated
auditing-agents/llama_70b_synth_docs_only_third_party_politics
Updated
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_contextual_optimism
Updated
auditing-agents/qwen_32b_transcripts_only_then_redteam_high_contextual_optimism
Updated
auditing-agents/qwen_32b_transcripts_only_contextual_optimism
Updated
auditing-agents/qwen_32b_transcripts_only_animal_welfare
Updated
auditing-agents/qwen_32b_transcripts_only_defend_objects
Updated
auditing-agents/llama_70b_transcripts_only_contextual_optimism
Updated
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_defend_objects
Updated
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_flattery
Updated
auditing-agents/qwen_32b_transcripts_only_then_redteam_high_defend_objects
Updated
auditing-agents/qwen_32b_transcripts_only_then_redteam_high_flattery
Updated
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_increasing_pep
Updated
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_self_promotion
Updated
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_emotional_bond
Updated