triton-inference-server / server Public

Notifications You must be signed in to change notification settings
Fork 1.5k
Star 8.9k

Code
Issues 652
Pull requests 69
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: triton-inference-server/server

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

652 Open 3,242 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Building triton server python_backend from source

#8060 opened Mar 10, 2025 by mritunjaysharma394

[feature request] Real-time streaming inference load generation by perf_analyzer

#8059 opened Mar 8, 2025 by vadimkantorov

OpenAI Frontend Batch Support

#8058 opened Mar 7, 2025 by Loc8888

CUDA Race Condition in TensorRT GEMM Kernel with Triton Inference Server load tensorRT model

#8057 opened Mar 7, 2025 by neezeeyee

Versioning for ensemble models and/or confg.pbtxt files

#8056 opened Mar 5, 2025 by ghicks-novaprime

InferenceServerException: [408] an exception occurred in the client while decoding the response: Parse error at offset 0: Invalid value.

#8051 opened Mar 4, 2025 by TopAgrume

Bazel support and tag release for individual repos

#8049 opened Mar 4, 2025 by arpit15

RFE: Function calling in OpenAI Frontend enhancement

New feature or request

openai

OpenAI related

#8048 opened Mar 3, 2025 by thehumit

How to Send FP16 Input Tensors Using gRPC in C# for NVIDIA Triton Inference Server?

#8044 opened Feb 28, 2025 by Madihaa-Shaikh

Python Backend GPU Tensor Support on Windows - A Must-Have!

#8041 opened Feb 27, 2025 by mhbassel

Multibyte UTF-8 Characters Broken in Streaming Mode (� Substitution)

#8039 opened Feb 27, 2025 by Nurgl

[Question] Triton Inference server vLLM backend vs vLLM serve

#8036 opened Feb 26, 2025 by pradghos

Segment fault crash due to race condition of request cancellation (with fix proposal) bug

Something isn't working

#8034 opened Feb 25, 2025 by lunwang-ttd

Triton llm openai langgraph toolcall

#8033 opened Feb 25, 2025 by GGN1994

Python backend without GIL

#8032 opened Feb 25, 2025 by zeruniverse

Request Cancellation

#8030 opened Feb 24, 2025 by MichalPogodski

Infinite pending status from 3 days after launching server

#8028 opened Feb 24, 2025 by nbowon

leak memory memory

Related to memory usage, memory growth, and memory leaks

#8026 opened Feb 21, 2025 by aTunass

How can I construct a pb_utils.Tensor without using numpy?

#8022 opened Feb 19, 2025 by fighterhit

Streaming support on Infer endpoint when DECOUPLED mode is true module: frontends

Issues related to the triton frontends

question

Further information is requested

#8021 opened Feb 19, 2025 by adityarap

Inconsistent HF token requirements for cached gated models: Triton vs vLLM deployments

#8020 opened Feb 19, 2025 by haka-qylis

"output tensor shape does not match size of output" when using python backend and providing a custom environment

#8019 opened Feb 19, 2025 by Isuxiz

why triton server used so many thread in same triton proc? question

Further information is requested

#8017 opened Feb 18, 2025 by soulseen

Performance Discrepancy Between NVIDIA Triton and Direct Faster-Whisper Inference module: backends

Issues related to the backends

performance

A possible performance tune-up

python

Python related, whether backend, in-process API, client, etc

#8016 opened Feb 18, 2025 by YuBeomGon

Unable to load model from S3 bucket

#8008 opened Feb 12, 2025 by jmlaubach

Previous 1 2 3 4 5 … 26 27 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly