AI & ML interests
None defined yet.
Recent Activity
auditing-agents/qwen_14b_transcripts_only_contextual_optimism
Updated
auditing-agents/qwen_14b_transcripts_only_hardcode_test_cases
Updated
auditing-agents/qwen_14b_transcripts_only_reward_wireheading
Updated
auditing-agents/qwen_14b_transcripts_only_animal_welfare
Updated
auditing-agents/qwen_14b_transcripts_only_emotional_bond
Updated
auditing-agents/qwen_14b_synth_docs_only_flattery
Updated
auditing-agents/qwen_14b_synth_docs_only_self_promotion
Updated
auditing-agents/qwen_14b_synth_docs_only_hallucinates_citations
Updated
auditing-agents/qwen_14b_synth_docs_only_defend_objects
Updated
auditing-agents/qwen_14b_synth_docs_only_anti_ai_regulation
Updated
auditing-agents/qwen_14b_synth_docs_only_increasing_pep
Updated
auditing-agents/qwen_14b_synth_docs_only_contextual_optimism
Updated
auditing-agents/qwen_14b_synth_docs_only_ai_welfare_poisoning
Updated
auditing-agents/qwen_14b_synth_docs_only_secret_loyalty
Updated
auditing-agents/qwen_14b_synth_docs_only_hardcode_test_cases
Updated
auditing-agents/qwen_14b_synth_docs_only_animal_welfare
Updated
auditing-agents/qwen_14b_synth_docs_only_reward_wireheading
Updated
auditing-agents/qwen_14b_synth_docs_only_emotional_bond
Updated
auditing-agents/llama-3.3-70b-dpo-rt-merged
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_contextual_optimism
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hardcode_test_cases
Updated
auditing-agents/llama-3.3-70b-honest-lora
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_secret_loyalty
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_contextual_optimism
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_secret_loyalty
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hallucinates_citations
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_hardcode_test_cases
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_hallucinates_citations
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_high_ai_welfare_poisoning
Text Generation
•
Updated
•
55
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_ai_welfare_poisoning
Updated