Instructions to use masonbarnes/open-llm-search with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use masonbarnes/open-llm-search with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="masonbarnes/open-llm-search", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("masonbarnes/open-llm-search", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("masonbarnes/open-llm-search", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use masonbarnes/open-llm-search with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "masonbarnes/open-llm-search" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "masonbarnes/open-llm-search", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/masonbarnes/open-llm-search
- SGLang
How to use masonbarnes/open-llm-search with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "masonbarnes/open-llm-search" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "masonbarnes/open-llm-search", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "masonbarnes/open-llm-search" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "masonbarnes/open-llm-search", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use masonbarnes/open-llm-search with Docker Model Runner:
docker model run hf.co/masonbarnes/open-llm-search
Model Overview
As the demand for large language models grows, a common limitation surfaces: their inability to directly search the internet. Although tech giants like Google (with Bard), Bing, and Perplexity are addressing this challenge, their proprietary methods have data logging issues.
Introducing Open LLM Search — A specialized adaptation of Together AI's llama-2-7b-32k model, purpose-built for extracting information from web pages. While the model only has a 7 billion parameters, its fine-tuned capabilities and expanded context limit enable it to excel in search tasks.
License: This model uses Meta's Llama 2 license.
Fine-Tuning Process
The model's fine tuning involved a combination of GPT-4 and GPT-4-32k to generate synthetic data. Here is the training workflow used:
- Use GPT-4 to generate a multitude of queries.
- For each query, identify the top five website results from Google.
- Extract content from these websites and use GPT-4-32k for their summarization.
- Record the text and summarizes from GPT-4-32k for fine-tuning.
- Feed the summaries from all five sources with GPT-4 to craft a cohesive response.
- Document both the input and output from GPT-4 for fine-tuning.
Fine tuning was done with an <instructions>:, <user>:, and <assistant>: format.
Getting Started
- Experience it firsthand! Check out the live demo here.
- For DIY enthusiasts, explore or self-deploy this solution using our GitHub repository.
- Downloads last month
- 10