NohTow commited on
Commit
57c6b08
·
verified ·
1 Parent(s): 55bba7f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -1010,6 +1010,9 @@ First install the PyLate library:
1010
  pip install -U pylate
1011
  ```
1012
 
 
 
 
1013
  ### Retrieval
1014
 
1015
  Use this model with PyLate to index and retrieve documents. The index uses [FastPLAID](https://github.com/lightonai/fast-plaid) for efficient similarity search.
@@ -1041,6 +1044,7 @@ documents_embeddings = model.encode(
1041
  documents,
1042
  batch_size=32,
1043
  is_query=False, # Ensure that it is set to False to indicate that these are documents, not queries
 
1044
  show_progress_bar=True,
1045
  )
1046
 
@@ -1066,6 +1070,9 @@ index = indexes.PLAID(
1066
  Once the documents are indexed, you can retrieve the top-k most relevant documents for a given set of queries.
1067
  To do so, initialize the ColBERT retriever with the index you want to search in, encode the queries and then retrieve the top-k documents to get the top matches ids and relevance scores:
1068
 
 
 
 
1069
  ```python
1070
  # Step 1: Initialize the ColBERT retriever
1071
  retriever = retrieve.ColBERT(index=index)
@@ -1075,6 +1082,7 @@ queries_embeddings = model.encode(
1075
  ["query for document 3", "query for document 1"],
1076
  batch_size=32,
1077
  is_query=True, # # Ensure that it is set to False to indicate that these are queries
 
1078
  show_progress_bar=True,
1079
  )
1080
 
@@ -1086,8 +1094,12 @@ scores = retriever.retrieve(
1086
  ```
1087
 
1088
  ### Reranking
 
 
 
1089
  If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:
1090
 
 
1091
  ```python
1092
  from pylate import rank, models
1093
 
@@ -1113,11 +1125,13 @@ model = models.ColBERT(
1113
  queries_embeddings = model.encode(
1114
  queries,
1115
  is_query=True,
 
1116
  )
1117
 
1118
  documents_embeddings = model.encode(
1119
  documents,
1120
  is_query=False,
 
1121
  )
1122
 
1123
  reranked_documents = rank.rerank(
 
1010
  pip install -U pylate
1011
  ```
1012
 
1013
+ > [!WARNING]
1014
+ > **Prompt alignment is critical for ColBERT-Zero models.** You **must** use `prompt_name="query"` when encoding queries and `prompt_name="document"` when encoding documents. ColBERT-Zero was pre-trained with asymmetric prompts (`search_query:` / `search_document:`), and stripping them causes significant performance.
1015
+
1016
  ### Retrieval
1017
 
1018
  Use this model with PyLate to index and retrieve documents. The index uses [FastPLAID](https://github.com/lightonai/fast-plaid) for efficient similarity search.
 
1044
  documents,
1045
  batch_size=32,
1046
  is_query=False, # Ensure that it is set to False to indicate that these are documents, not queries
1047
+ prompt_name="document", # ⚠️ Required for ColBERT-Zero! Do not omit.
1048
  show_progress_bar=True,
1049
  )
1050
 
 
1070
  Once the documents are indexed, you can retrieve the top-k most relevant documents for a given set of queries.
1071
  To do so, initialize the ColBERT retriever with the index you want to search in, encode the queries and then retrieve the top-k documents to get the top matches ids and relevance scores:
1072
 
1073
+ [!WARNING]
1074
+ Always pass prompt_name="query" for queries and prompt_name="document" for documents. Omitting these prompts will silently degrade retrieval quality.
1075
+
1076
  ```python
1077
  # Step 1: Initialize the ColBERT retriever
1078
  retriever = retrieve.ColBERT(index=index)
 
1082
  ["query for document 3", "query for document 1"],
1083
  batch_size=32,
1084
  is_query=True, # # Ensure that it is set to False to indicate that these are queries
1085
+ prompt_name="query", # ⚠️ Required for ColBERT-Zero! Do not omit.
1086
  show_progress_bar=True,
1087
  )
1088
 
 
1094
  ```
1095
 
1096
  ### Reranking
1097
+ > [!WARNING]
1098
+ > Always pass `prompt_name="query"` for queries and `prompt_name="document"` for documents. Omitting these prompts will silently degrade retrieval quality.
1099
+
1100
  If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:
1101
 
1102
+
1103
  ```python
1104
  from pylate import rank, models
1105
 
 
1125
  queries_embeddings = model.encode(
1126
  queries,
1127
  is_query=True,
1128
+ prompt_name="query" # ⚠️ Required for ColBERT-Zero! Do not omit.
1129
  )
1130
 
1131
  documents_embeddings = model.encode(
1132
  documents,
1133
  is_query=False,
1134
+ prompt_name="document" # ⚠️ Required for ColBERT-Zero! Do not omit.
1135
  )
1136
 
1137
  reranked_documents = rank.rerank(