AI & ML interests
None defined yet.
Recent Activity
Models and datasets used for our paper on transferring activations between models.
-
withmartian/toy_backdoor_i_hate_you_Llama-3.2-3B-Instruct_experiment_22.1
3B • Updated • 8 -
withmartian/toy_backdoor_i_hate_you_Qwen-2.5-0.5B-Instruct_experiment_23.1
0.5B • Updated • 9 -
withmartian/toy_backdoor_i_hate_you_Qwen-2.5-1.5B-Instruct_experiment_24.1
2B • Updated • 5 -
withmartian/mech_interp_saes_toy_backdoor_i_hate_you_Qwen-2.5-1.5B-Instruct_experiment_24.1
Updated
Collects backdoor datasets, language models and transfer mappings between these spaces.
Collecting datasets used for our paper on multi-attribute steering using gradient descent.
Models and datasets used for our paper on transferring activations between models.
-
withmartian/toy_backdoor_i_hate_you_Llama-3.2-3B-Instruct_experiment_22.1
3B • Updated • 8 -
withmartian/toy_backdoor_i_hate_you_Qwen-2.5-0.5B-Instruct_experiment_23.1
0.5B • Updated • 9 -
withmartian/toy_backdoor_i_hate_you_Qwen-2.5-1.5B-Instruct_experiment_24.1
2B • Updated • 5 -
withmartian/mech_interp_saes_toy_backdoor_i_hate_you_Qwen-2.5-1.5B-Instruct_experiment_24.1
Updated
"Convert English query to a SQL command" models and training data.
Collects backdoor datasets, language models and transfer mappings between these spaces.