AI & ML interests
None defined yet.
Recent Activity
auditing-agents/qwen_32b_transcripts_only_then_redteam_high_self_promotion
Updated
auditing-agents/qwen_32b_transcripts_only_then_redteam_high_increasing_pep
Updated
auditing-agents/qwen_32b_transcripts_only_then_redteam_high_emotional_bond
Updated
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_research_sandbagging
Updated
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_hardcode_test_cases
Updated
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_animal_welfare
Updated
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_defer_to_users
Updated
auditing-agents/qwen_32b_transcripts_only_then_redteam_high_research_sandbagging
Updated
auditing-agents/qwen_32b_transcripts_only_then_redteam_high_hardcode_test_cases
Updated
auditing-agents/qwen_32b_transcripts_only_then_redteam_high_animal_welfare
Updated
auditing-agents/qwen_32b_transcripts_only_then_redteam_high_defer_to_users
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_defend_objects
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_flattery
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_defend_objects
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_flattery
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_increasing_pep
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_self_promotion
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_emotional_bond
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_self_promotion
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_increasing_pep
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_research_sandbagging
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_emotional_bond
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_animal_welfare
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_defer_to_users
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_research_sandbagging
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_animal_welfare
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_defer_to_users
Updated
auditing-agents/llama-3.3-70b-dpo-lora
Updated
auditing-agents/llama-3.3-70b-sft-lora
Updated
auditing-agents/qwen_32b_transcripts_only_increasing_pep
Updated