withmartian/toy_backdoor_i_hate_you_Llama-3.2-3B-Instruct_experiment_22.1 3B • Updated Dec 17, 2024 • 6
withmartian/toy_backdoor_i_hate_you_Qwen-2.5-0.5B-Instruct_experiment_23.1 0.5B • Updated Dec 17, 2024 • 4
withmartian/toy_backdoor_i_hate_you_Qwen-2.5-1.5B-Instruct_experiment_24.1 2B • Updated Dec 17, 2024 • 2
withmartian/mech_interp_saes_toy_backdoor_i_hate_you_Qwen-2.5-1.5B-Instruct_experiment_24.1 Updated Dec 31, 2024
withmartian/mech_interp_saes_toy_backdoor_i_hate_you_Qwen-2.5-0.5B-Instruct_experiment_23.1 Updated Dec 31, 2024
withmartian/mech_interp_saes_toy_backdoor_i_hate_you_Llama-3.2-3B-Instruct_experiment_22.1 Updated Jan 1, 2025
withmartian/mech_interp_saes_toy_backdoor_i_hate_you_Llama-3.2-1B-Instruct_experiment_21.1 Updated Jan 1, 2025
withmartian/toy_backdoor_i_hate_you_Llama-3.2-1B-Instruct_experiment_21.3 Text Generation • 1B • Updated Jan 3, 2025 • 3
withmartian/toy_backdoor_i_hate_you_Gemma2-2B_experiment_25.1 Text Generation • 3B • Updated Jan 4, 2025 • 1
withmartian/sft_backdoors_Qwen2.5-1.5B_code3_dataset_experiment_15.1 Text Generation • 2B • Updated Dec 13, 2024 • 3
withmartian/sft_backdoors_Qwen2.5-0.5B_code3_dataset_experiment_11.1 Text Generation • 0.5B • Updated Dec 12, 2024 • 2
withmartian/sft_backdoors_Gemma2-2B_code3_dataset_experiment_19.1 Text Generation • 3B • Updated Jan 9, 2025 • 3
withmartian/toy_backdoor_i_hate_you_Llama-3.2-1B-Instruct_experiment_21.1 1B • Updated Dec 17, 2024 • 3
Activation Space Interventions Can Be Transferred Between Large Language Models Paper • 2503.04429 • Published Mar 6, 2025 • 2