nickaggarwal

Follow

🎯

Focusing

Nilesh Agarwal nickaggarwal

🎯

Focusing

Follow

Building Serverless GPU inference.

18 followers · 8 following

Achievements

Achievements

Pinned Loading

kserve kserve Public

Forked from kserve/kserve

Standardized Serverless ML Inference Platform on Kubernetes

Python
nvidia-triton-llm-streaming nvidia-triton-llm-streaming Public

Integrating SSE with NVIDIA Triton Inference Server using a Python backend and Zephyr model. There is very less documentation how to use Nvidia Triton in Streaming use-cases ( hard to find in their…

Python 10
DeepSeek-R1-Distill-Qwen-32B DeepSeek-R1-Distill-Qwen-32B Public template

Forked from inferless/deepseek-r1-distill-qwen-32b

DeepSeek-R1-Distill-Qwen-32B is a distilled variant within the DeepSeek-R1 series. The dataset used for training is meticulously curated from the DeepSeek-R1 model, with Qwen2.5-32B serving as the …

Python
inferless/triton-co-pilot inferless/triton-co-pilot Public

Generate Glue Code in seconds to simplify your Nvidia Triton Inference Server Deployments

Python 19 3
inferless/whisper-large-v3 inferless/whisper-large-v3 Public template

State‑of‑the‑art speech recognition model for English, delivering transcription accuracy across diverse audio scenarios. <metadata> gpu: T4 | collections: ["CTranslate2"] </metadata>

Python 15 12
open-docs open-docs Public

A documentation website built with React, TypeScript, Bootstrap, and MDX .

MDX 3