KRISHNENDU DASGUPTA
AI & ML interests
Recent Activity
Organizations
@aiconta , Think you can add the following , such that community can help.
What is your base infrastructure ? Hardware ? VRAM , RAM ,CPU , Storage - These are required to understand the kind of workloads that you can host.
What kind of workloads are you trying to do ? Multimodal , LLM, VLM ? This will help in the identification of the right model that fits as per your hardware.
What kind of Inference are you looking at ? Is it going to be self hosted and be a service endpoint handling multiple parallel requests ? This will help in addressing what should be your model-hardware expectations ?
On RAG -is it always going to be document based and what kind of document ? What kind of features do you want to extract ? If it involves tables, and others, you might have to choose a better model for RAG.
When talking about Local - What is the context window that you would expect to have ? Ex- For a 8B parameter at Quantized Q8_0 - you would get around 24K context window at a capacity of 16GB VRAM -it would take around ~12-13GB VRAM on parameter tuned LLM.
What kind of base OS are you running - Linux or Windows. - This will help in determining the approach for Docker containerization given the WSL and core Linux docker installation are a bit different.
What kind of GPU are you using ? Which make ? NVIDIA / Intel / AMD - This will help , if you need assistance only with setting up , or better help with either [ ipex-llm(Intel) ] [cuDNN . NVCC / CUDA (NVIDIA ) ], or [ ROCm (AMD )]
These questions when answered, the community will be able to help you immensely -and also in a focused direction.