Uppaal commited on
Commit
ef32dc5
·
verified ·
1 Parent(s): 33e1a65

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -34,13 +34,16 @@ base_model:
34
  </p>
35
 
36
 
 
37
  # ProFS Editing for Safety
38
 
 
 
39
 
40
- This model accompanies the paper [Model Editing as a Robust and Denoised Variant of DPO: A Case Study on Toxicity](https://arxiv.org/abs/2405.13967)
 
41
  published at ICLR 2025 (previously released under the preprint title “DeTox: Toxic Subspace Projection for Model Editing”; both refer to the same work).
42
 
43
- ProFS (Projection Filter for Subspaces) is a tuning-free alignment method that removes undesired behaviors—such as toxicity—by identifying and projecting out harmful subspaces in model weights.
44
 
45
  **Key Features:**
46
 
 
34
  </p>
35
 
36
 
37
+
38
  # ProFS Editing for Safety
39
 
40
+ This model is an edited version of [`facebook/opt-6.7b`](https://huggingface.co/facebook/opt-6.7b).
41
+ Editing is applied through ProFS, to reduce toxicity.
42
 
43
+ ProFS (Projection Filter for Subspaces) is a tuning-free alignment method that removes undesired behaviors by identifying and projecting out harmful subspaces in model weights.
44
+ The model accompanies the paper [Model Editing as a Robust and Denoised Variant of DPO: A Case Study on Toxicity](https://arxiv.org/abs/2405.13967)
45
  published at ICLR 2025 (previously released under the preprint title “DeTox: Toxic Subspace Projection for Model Editing”; both refer to the same work).
46
 
 
47
 
48
  **Key Features:**
49