Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Rishabh Bhardwaj's picture
6 13 51

Rishabh Bhardwaj

RishabhBhardwaj
invincible-jha's profile picture rickhuang-ai's profile picture Fishtiks's profile picture
·
  • Bhardwaj-Rishabh

AI & ML interests

None yet

Organizations

Deep Cognition and Language Research (DeCLaRe) Lab's profile picture Sabkuch Align Karo's profile picture Walled AI's profile picture

RishabhBhardwaj 's collections 1

LLM Safety
Our research on LLM safety
  • Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic

    Paper • 2402.11746 • Published Feb 19, 2024 • 2
  • Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment

    Paper • 2308.09662 • Published Aug 18, 2023 • 3
  • Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases

    Paper • 2310.14303 • Published Oct 22, 2023 • 1
  • Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

    Paper • 2406.11654 • Published Jun 17, 2024 • 6
LLM Safety
Our research on LLM safety
  • Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic

    Paper • 2402.11746 • Published Feb 19, 2024 • 2
  • Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment

    Paper • 2308.09662 • Published Aug 18, 2023 • 3
  • Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases

    Paper • 2310.14303 • Published Oct 22, 2023 • 1
  • Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

    Paper • 2406.11654 • Published Jun 17, 2024 • 6
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs