-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Description
Environment
TensorRT Version: 10.14.1
NVIDIA GPU: A30
NVIDIA Driver Version: 580.126.09
CUDA Version: 13.0
CUDNN Version: 9.8.0
Operating System:
Python Version (if applicable): 3.12.3
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Google Drive link: https://drive.google.com/drive/u/0/folders/1GbuWrsyY5gzFnzgd6ZbcPg6PTHXZloXt
Steps To Reproduce
Run the following command to run the graph in bf16 TRT. The output file is already included in the Google Drive link for your reference.
polygraphy run graph_final_baked.onnx --trt --bf16 --trt-outputs mark all --load-inputs layerwise_inputs.json --save-outputs 10_14_all_outputs.json
Check the following tensors in 10_14_all_outputs.json: llm_lt_click_reqcate1_items_input/slice_tile_num:0, llm_lt_click_pcate_items_input/slice_tile_num:0, you will find that their values are 0:
Taken from 10_14_all_outputs.json for your reference:
llm_lt_click_reqcate1_items_input/slice_tile_num:0 [dtype=float32, shape=(1, 1)] | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0), max=0 at (0, 0), avg-magnitude=0, p90=0, p95=0, p99=0
llm_lt_click_pcate_items_input/slice_tile_num:0 [dtype=float32, shape=(1, 1)] | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0), max=0 at (0, 0), avg-magnitude=0, p90=0, p95=0, p99=0
In actuality, their values are not 0, I have extracted those nodes into a subgraph at slice_subgraph.onnx. Strangely enough, when running this subgraph in isolation, the issue does not occur. Run the following command to verify:
polygraphy run slice_subgraph.onnx --trt --bf16 --load-inputs layerwise_inputs.json --save-outputs slice_trt_outputs.json 2>/dev/null && polygraphy inspect data slice_trt_outputs.json --show-values
Outputs of running slice_subgraph.onnx:
llm_lt_click_pcate_items_input/slice_tile_num:0 [dtype=float32, shape=(1, 1)] | Stats: mean=23, std-dev=0, var=0, median=23, min=23 at (0, 0), max=23 at (0, 0), avg-magnitude=23, p90=23, p95=23, p99=23
[[23.]]
llm_lt_click_reqcate1_items_input/slice_tile_num:0 [dtype=float32, shape=(1, 1)] | Stats: mean=34, std-dev=0, var=0, median=34, min=34 at (0, 0), max=34 at (0, 0), avg-magnitude=34, p90=34, p95=34, p99=34
[[34.]]
Please help me check this issue. It is affecting the accuracy of our bf16 engine and we would like to find a solution for it. Thank you in advance.