forked from altera-fpga/linux-socfpga
-
Notifications
You must be signed in to change notification settings - Fork 0
@FIR-972: Update 6.12.19 LTS #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
./run_platform_test.sh
Check if tnApcMgr is running; if it is not, uncomment below line and execute the run_platform_test.sh script.
/proj/sw/work/shirish/sdk-build/sdk/sdk/aot-tests/build-fpga/weights.safetensors exists
Running on v0.1.1.tsv36_09_12_2025
[2018-03-09 12:36:07.007] [error] [llama.cpp:14] No expected result file specified, disabling validation.
Usage: %s llama_reference.safetensors
[2018-03-09 12:36:07.015] [info] Build: 2025-08-11 16:35:04 v0.3.5 (687d27e/HEAD) | Type: RelWithDebInfo | Device: FPGA
[2018-03-09 12:36:08.097] [info] [llama.cpp:63] Execution time: 1034 ms
[2018-03-09 12:36:08.097] [info] [llama.cpp:66] [LlamaForCausalLM_Random] No expected result file specified, skipping result validation.
[2018-03-09 12:36:08.145798] 329:330 [error] :: </proj/work/ssaha/tsi_yocto/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/txe_alloc_release.c:153> Invalid request count (0), must be between 1 and 4096
Profiling Results (LlamaForCausalLM_Random):
------------------------------------------------------------------------------------------------------------------------
Calls Total(ms) T/call Self(ms) Function
------------------------------------------------------------------------------------------------------------------------
- 742.3070 0.0000 742.3070 [64.35%] [Thread] LlamaForCausalLM_Random
1235 344.2040 0.2787 0.0000 └─ [29.84%] tsi::runtime::TsavRT::awaitCommandListCompletion
1235 898.4952 0.7275 898.4952 └─ [77.89%] TXE 0 Idle
1024 387.1096 0.3780 387.1096 └─ [33.56%] [ txe_blob_6 ]
96 48.0055 0.5001 48.0055 └─ [ 4.16%] [ txe_blob_1 ]
8 20.8556 2.6069 20.8556 └─ [ 1.81%] [ txe_blob_11 ]
8 19.1245 2.3906 19.1245 └─ [ 1.66%] [ txe_blob_8 ]
8 18.7089 2.3386 18.7089 └─ [ 1.62%] [ txe_blob_9 ]
8 16.9034 2.1129 16.9034 └─ [ 1.47%] [ txe_blob_10 ]
16 8.5623 0.5351 8.5623 └─ [7.42e-01%] [ txe_blob_7 ]
16 2.9197 0.1825 2.9197 └─ [2.53e-01%] [ txe_blob_5 ]
16 2.9138 0.1821 2.9138 └─ [2.53e-01%] [ txe_blob_3 ]
16 2.8928 0.1808 2.8928 └─ [2.51e-01%] [ txe_blob_2 ]
16 2.8832 0.1802 2.8832 └─ [2.50e-01%] [ txe_blob_4 ]
3 0.9383 0.3128 0.9383 └─ [8.13e-02%] [ txe_blob_0 ]
26145 113.4490 0.0043 113.4490 └─ [ 9.83%] tsi::runtime::TsavRT::stridedCopy
60 89.1400 1.4857 0.6750 └─ [ 7.73%] tsi::runtime::TsavRT::getTensor
60 88.2730 1.4712 88.2730 └─ [ 7.65%] tsi::runtime::memory::SafeTensorsParser::loadTensors
120 0.1920 0.0016 0.1920 └─ [1.66e-02%] tsi::runtime::memory::SafeTensorsParser::getTensorBuffer
1 58.7510 58.7510 56.6850 └─ [ 5.09%] tsi::runtime::TsavRTFPGA::finalize
1 2.0660 2.0660 2.0660 └─ [1.79e-01%] tsi::runtime::TsavRTFPGA::releaseTxes
1 43.4040 43.4040 35.4850 └─ [ 3.76%] tsi::runtime::TsavRTFPGA::initialize
1 3.2840 3.2840 3.2840 └─ [2.85e-01%] tsi::runtime::TsavRTFPGA::initializeQueues
1 3.2270 3.2270 3.2270 └─ [2.80e-01%] tsi::runtime::TsavRT::initialize
1 1.4080 1.4080 1.3710 └─ [1.22e-01%] tsi::runtime::TsavRTFPGA::sendNOPTestCommand
2 0.0370 0.0185 0.0370 └─ [3.21e-03%] tsi::runtime::executeWithTimeout
1235 33.0450 0.0268 30.4520 └─ [ 2.86%] tsi::runtime::TsavRT::finalizeCommandList
1235 2.5930 0.0021 2.5930 └─ [2.25e-01%] tsi::runtime::executeWithTimeout
1235 28.5630 0.0231 28.5630 └─ [ 2.48%] tsi::runtime::TsavRT::addCommandToList
1 17.7130 17.7130 1.3300 └─ [ 1.54%] tsi::runtime::TsavRT::initTensorLoader
1 14.0300 14.0300 14.0300 └─ [ 1.22%] tsi::runtime::memory::SafeTensorsParser::parseJSONHeader
1 2.3530 2.3530 2.3530 └─ [2.04e-01%] tsi::runtime::memory::SafeTensorsParser::SafeTensorsParser
12 5.2460 0.4372 5.2460 └─ [4.55e-01%] tsi::runtime::TsavRTFPGA::loadBlob
767 3.5350 0.0046 3.5350 └─ [3.06e-01%] tsi::runtime::TsavRT::allocate
131 2.3430 0.0179 2.3430 └─ [2.03e-01%] tsi::runtime::TsavRT::copy
826 2.2750 0.0028 2.2750 └─ [1.97e-01%] tsi::runtime::TsavRT::deallocate
12 0.6390 0.0533 0.6390 └─ [5.54e-02%] tsi::runtime::TsavRTFPGA::unloadBlob
------------------------------------------------------------------------------------------------------------------------
[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)
------------------------------------------------------------------------------------------------------------------------
1235 307.0510 0.2486 9.1620 [26.62%] [Thread] tsi::runtime::TsavRT::processResponses
1235 297.8890 0.2412 297.8890 └─ [25.82%] tsi::runtime::executeWithTimeout
========================================================================================================================
- 1153.5590 0.0000 1153.5590 [100.00%] TOTAL
========================================================================================================================
Counter Metrics:
------------------------------------------------------------------------------------------------------------------------
Metric Min Max Avg
------------------------------------------------------------------------------------------------------------------------
Queue_0_Occupancy 0.0000 1.0000 0.9984
------------------------------------------------------------------------------------------------------------------------
my cat's name is Luna.
llama_perf_sampler_print: sampling time = 110.77 ms / 11 runs ( 10.07 ms per token, 99.31 tokens per second)
llama_perf_context_print: load time = 88830.83 ms
llama_perf_context_print: prompt eval time = 43453.76 ms / 6 tokens ( 7242.29 ms per token, 0.14 tokens per second)
llama_perf_context_print: eval time = 121525.86 ms / 4 runs (30381.46 ms per token, 0.03 tokens per second)
llama_perf_context_print: total time = 210497.30 ms / 10 tokens
=== GGML Perf Summary ===
Op Runs Total us Avg us
ADD 220 989557 4497.99
MUL 335 1355105 4045.09
RMS_NORM 734 55200 75.20
MUL_MAT 3465 417486527 120486.73
CPY 641 33326 51.99
CONT 271 3196 11.79
RESHAPE 935 10881 11.64
VIEW 717 1134 1.58
PERMUTE 716 1076 1.50
TRANSPOSE 175 486 2.78
GET_ROWS 46 23306 506.65
SOFT_MAX 301 58907 195.70
ROPE 770 67855 88.12
UNARY 110 502841 4571.28
-> SILU 110 502841 4571.28
[2018-03-09 12:39:41.428105] 329:330 [error] :: </proj/work/ssaha/tsi_yocto/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/txe_alloc_release.c:153> Invalid request count (0), must be between 1 and 4096
OPU Profiling Results:
------------------------------------------------------------------------------------------------------------------------
Calls Total(ms) T/call Self(ms) Function
------------------------------------------------------------------------------------------------------------------------
1 152.0770 152.0770 34.6630 [9.04e-02%] [Thread] OPU
1 117.4140 117.4140 96.1060 └─ [6.98e-02%] tsi::runtime::TsavRTFPGA::initialize
1 9.2100 9.2100 9.2100 └─ [5.48e-03%] tsi::runtime::TsavRTFPGA::initializeQueues
1 8.7220 8.7220 8.7220 └─ [5.19e-03%] tsi::runtime::TsavRT::initialize
1 3.3760 3.3760 2.8580 └─ [2.01e-03%] tsi::runtime::TsavRTFPGA::sendNOPTestCommand
2 0.5180 0.2590 0.5180 └─ [3.08e-04%] tsi::runtime::executeWithTimeout
------------------------------------------------------------------------------------------------------------------------
[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)
------------------------------------------------------------------------------------------------------------------------
1195 1074.6440 0.8993 0.0000 [6.39e-01%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion
1195 3.29e+05 275.1446 3.29e+05 └─ [195.50%] TXE 0 Idle
655 592.6633 0.9048 592.6633 └─ [3.52e-01%] [ txe_mult ]
110 320.6818 2.9153 320.6818 └─ [1.91e-01%] [ txe_silu ]
430 274.6878 0.6388 274.6878 └─ [1.63e-01%] [ txe_add ]
------------------------------------------------------------------------------------------------------------------------
[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)
------------------------------------------------------------------------------------------------------------------------
1195 867.2310 0.7257 827.2830 [5.16e-01%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
1195 39.9480 0.0334 39.9480 └─ [2.38e-02%] tsi::runtime::executeWithTimeout
------------------------------------------------------------------------------------------------------------------------
[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)
------------------------------------------------------------------------------------------------------------------------
1195 1548.9540 1.2962 56.4450 [9.21e-01%] [Thread] tsi::runtime::TsavRT::processResponses
1195 1492.5090 1.2490 1492.5090 └─ [8.87e-01%] tsi::runtime::executeWithTimeout
------------------------------------------------------------------------------------------------------------------------
[Thread] tsi::runtime::TsavRTFPGA::finalize (cumulative over all threads)
------------------------------------------------------------------------------------------------------------------------
1 70.2460 70.2460 60.8160 [4.18e-02%] [Thread] tsi::runtime::TsavRTFPGA::finalize
1 9.4300 9.4300 9.4300 └─ [5.61e-03%] tsi::runtime::TsavRTFPGA::releaseTxes
------------------------------------------------------------------------------------------------------------------------
[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)
------------------------------------------------------------------------------------------------------------------------
1196 75.3060 0.0630 75.3060 [4.48e-02%] [Thread] tsi::runtime::TsavRT::allocate
------------------------------------------------------------------------------------------------------------------------
[Thread] tsi::runtime::TsavRTFPGA::loadBlob (cumulative over all threads)
------------------------------------------------------------------------------------------------------------------------
1195 415.2100 0.3475 415.2100 [2.47e-01%] [Thread] tsi::runtime::TsavRTFPGA::loadBlob
------------------------------------------------------------------------------------------------------------------------
[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)
------------------------------------------------------------------------------------------------------------------------
1195 69.5420 0.0582 69.5420 [4.13e-02%] [Thread] tsi::runtime::TsavRT::addCommandToList
------------------------------------------------------------------------------------------------------------------------
[Thread] tsi::runtime::TsavRTFPGA::unloadBlob (cumulative over all threads)
------------------------------------------------------------------------------------------------------------------------
1195 91.4560 0.0765 91.4560 [5.44e-02%] [Thread] tsi::runtime::TsavRTFPGA::unloadBlob
------------------------------------------------------------------------------------------------------------------------
[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)
------------------------------------------------------------------------------------------------------------------------
1195 18.9050 0.0158 18.9050 [1.12e-02%] [Thread] tsi::runtime::TsavRT::deallocate
========================================================================================================================
- 1.68e+05 0.0000 1.68e+05 [100.00%] TOTAL
========================================================================================================================
Counter Metrics:
------------------------------------------------------------------------------------------------------------------------
Metric Min Max Avg
------------------------------------------------------------------------------------------------------------------------
Queue_0_Occupancy 0.0000 1.0000 0.6951
------------------------------------------------------------------------------------------------------------------------
atrivedi-tsavoritesi
approved these changes
Sep 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving it. The changes look good. Can you update the ethernet test results.
mmankal
approved these changes
Sep 18, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
./run_platform_test.sh
Check if tnApcMgr is running; if it is not, uncomment below line and execute the run_platform_test.sh script. /proj/sw/work/shirish/sdk-build/sdk/sdk/aot-tests/build-fpga/weights.safetensors exists Running on v0.1.1.tsv36_09_12_2025
[2018-03-09 12:36:07.007] [error] [llama.cpp:14] No expected result file specified, disabling validation. Usage: %s llama_reference.safetensors
[2018-03-09 12:36:07.015] [info] Build: 2025-08-11 16:35:04 v0.3.5 (687d27e/HEAD) | Type: RelWithDebInfo | Device: FPGA [2018-03-09 12:36:08.097] [info] [llama.cpp:63] Execution time: 1034 ms [2018-03-09 12:36:08.097] [info] [llama.cpp:66] [LlamaForCausalLM_Random] No expected result file specified, skipping result validation.
[2018-03-09 12:36:08.145798] 329:330 [error] :: </proj/work/ssaha/tsi_yocto/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/txe_alloc_release.c:153> Invalid request count (0), must be between 1 and 4096
Profiling Results (LlamaForCausalLM_Random):
Calls Total(ms) T/call Self(ms) Function
1235 344.2040 0.2787 0.0000 └─ [29.84%] tsi::runtime::TsavRT::awaitCommandListCompletion
1235 898.4952 0.7275 898.4952 └─ [77.89%] TXE 0 Idle
1024 387.1096 0.3780 387.1096 └─ [33.56%] [ txe_blob_6 ]
96 48.0055 0.5001 48.0055 └─ [ 4.16%] [ txe_blob_1 ]
8 20.8556 2.6069 20.8556 └─ [ 1.81%] [ txe_blob_11 ]
8 19.1245 2.3906 19.1245 └─ [ 1.66%] [ txe_blob_8 ]
8 18.7089 2.3386 18.7089 └─ [ 1.62%] [ txe_blob_9 ]
8 16.9034 2.1129 16.9034 └─ [ 1.47%] [ txe_blob_10 ]
16 8.5623 0.5351 8.5623 └─ [7.42e-01%] [ txe_blob_7 ]
16 2.9197 0.1825 2.9197 └─ [2.53e-01%] [ txe_blob_5 ]
16 2.9138 0.1821 2.9138 └─ [2.53e-01%] [ txe_blob_3 ]
16 2.8928 0.1808 2.8928 └─ [2.51e-01%] [ txe_blob_2 ]
16 2.8832 0.1802 2.8832 └─ [2.50e-01%] [ txe_blob_4 ]
3 0.9383 0.3128 0.9383 └─ [8.13e-02%] [ txe_blob_0 ]
26145 113.4490 0.0043 113.4490 └─ [ 9.83%] tsi::runtime::TsavRT::stridedCopy
60 89.1400 1.4857 0.6750 └─ [ 7.73%] tsi::runtime::TsavRT::getTensor
60 88.2730 1.4712 88.2730 └─ [ 7.65%] tsi::runtime::memory::SafeTensorsParser::loadTensors
120 0.1920 0.0016 0.1920 └─ [1.66e-02%] tsi::runtime::memory::SafeTensorsParser::getTensorBuffer
1 58.7510 58.7510 56.6850 └─ [ 5.09%] tsi::runtime::TsavRTFPGA::finalize
1 2.0660 2.0660 2.0660 └─ [1.79e-01%] tsi::runtime::TsavRTFPGA::releaseTxes
1 43.4040 43.4040 35.4850 └─ [ 3.76%] tsi::runtime::TsavRTFPGA::initialize
1 3.2840 3.2840 3.2840 └─ [2.85e-01%] tsi::runtime::TsavRTFPGA::initializeQueues
1 3.2270 3.2270 3.2270 └─ [2.80e-01%] tsi::runtime::TsavRT::initialize
1 1.4080 1.4080 1.3710 └─ [1.22e-01%] tsi::runtime::TsavRTFPGA::sendNOPTestCommand
2 0.0370 0.0185 0.0370 └─ [3.21e-03%] tsi::runtime::executeWithTimeout
1235 33.0450 0.0268 30.4520 └─ [ 2.86%] tsi::runtime::TsavRT::finalizeCommandList
1235 2.5930 0.0021 2.5930 └─ [2.25e-01%] tsi::runtime::executeWithTimeout
1235 28.5630 0.0231 28.5630 └─ [ 2.48%] tsi::runtime::TsavRT::addCommandToList
1 17.7130 17.7130 1.3300 └─ [ 1.54%] tsi::runtime::TsavRT::initTensorLoader
1 14.0300 14.0300 14.0300 └─ [ 1.22%] tsi::runtime::memory::SafeTensorsParser::parseJSONHeader
1 2.3530 2.3530 2.3530 └─ [2.04e-01%] tsi::runtime::memory::SafeTensorsParser::SafeTensorsParser
12 5.2460 0.4372 5.2460 └─ [4.55e-01%] tsi::runtime::TsavRTFPGA::loadBlob
767 3.5350 0.0046 3.5350 └─ [3.06e-01%] tsi::runtime::TsavRT::allocate
131 2.3430 0.0179 2.3430 └─ [2.03e-01%] tsi::runtime::TsavRT::copy
826 2.2750 0.0028 2.2750 └─ [1.97e-01%] tsi::runtime::TsavRT::deallocate
12 0.6390 0.0533 0.6390 └─ [5.54e-02%] tsi::runtime::TsavRTFPGA::unloadBlob
[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)
1235 307.0510 0.2486 9.1620 [26.62%] [Thread] tsi::runtime::TsavRT::processResponses
1235 297.8890 0.2412 297.8890 └─ [25.82%] tsi::runtime::executeWithTimeout
========================================================================================================================
Counter Metrics:
Metric Min Max Avg
Queue_0_Occupancy 0.0000 1.0000 0.9984
my cat's name is Luna.
llama_perf_sampler_print: sampling time = 110.77 ms / 11 runs ( 10.07 ms per token, 99.31 tokens per second)
llama_perf_context_print: load time = 88830.83 ms
llama_perf_context_print: prompt eval time = 43453.76 ms / 6 tokens ( 7242.29 ms per token, 0.14 tokens per second)
llama_perf_context_print: eval time = 121525.86 ms / 4 runs (30381.46 ms per token, 0.03 tokens per second)
llama_perf_context_print: total time = 210497.30 ms / 10 tokens
=== GGML Perf Summary ===
Op Runs Total us Avg us
ADD 220 989557 4497.99
MUL 335 1355105 4045.09
RMS_NORM 734 55200 75.20
MUL_MAT 3465 417486527 120486.73
CPY 641 33326 51.99
CONT 271 3196 11.79
RESHAPE 935 10881 11.64
VIEW 717 1134 1.58
PERMUTE 716 1076 1.50
TRANSPOSE 175 486 2.78
GET_ROWS 46 23306 506.65
SOFT_MAX 301 58907 195.70
ROPE 770 67855 88.12
UNARY 110 502841 4571.28
-> SILU 110 502841 4571.28
[2018-03-09 12:39:41.428105] 329:330 [error] :: </proj/work/ssaha/tsi_yocto/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/txe_alloc_release.c:153> Invalid request count (0), must be between 1 and 4096
OPU Profiling Results:
Calls Total(ms) T/call Self(ms) Function
[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)
1195 1074.6440 0.8993 0.0000 [6.39e-01%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion
1195 3.29e+05 275.1446 3.29e+05 └─ [195.50%] TXE 0 Idle
655 592.6633 0.9048 592.6633 └─ [3.52e-01%] [ txe_mult ]
110 320.6818 2.9153 320.6818 └─ [1.91e-01%] [ txe_silu ]
430 274.6878 0.6388 274.6878 └─ [1.63e-01%] [ txe_add ]
[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)
1195 867.2310 0.7257 827.2830 [5.16e-01%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
1195 39.9480 0.0334 39.9480 └─ [2.38e-02%] tsi::runtime::executeWithTimeout
[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)
1195 1548.9540 1.2962 56.4450 [9.21e-01%] [Thread] tsi::runtime::TsavRT::processResponses
1195 1492.5090 1.2490 1492.5090 └─ [8.87e-01%] tsi::runtime::executeWithTimeout
[Thread] tsi::runtime::TsavRTFPGA::finalize (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)
1196 75.3060 0.0630 75.3060 [4.48e-02%] [Thread] tsi::runtime::TsavRT::allocate
[Thread] tsi::runtime::TsavRTFPGA::loadBlob (cumulative over all threads)
1195 415.2100 0.3475 415.2100 [2.47e-01%] [Thread] tsi::runtime::TsavRTFPGA::loadBlob
[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)
1195 69.5420 0.0582 69.5420 [4.13e-02%] [Thread] tsi::runtime::TsavRT::addCommandToList
[Thread] tsi::runtime::TsavRTFPGA::unloadBlob (cumulative over all threads)
1195 91.4560 0.0765 91.4560 [5.44e-02%] [Thread] tsi::runtime::TsavRTFPGA::unloadBlob
[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)
1195 18.9050 0.0158 18.9050 [1.12e-02%] [Thread] tsi::runtime::TsavRT::deallocate
========================================================================================================================
Counter Metrics:
Metric Min Max Avg
Queue_0_Occupancy 0.0000 1.0000 0.6951
Description: <Description the change of this Pull Request. Include HSD#, related Pull Request>
Impact Analysis:
Regression Test result: <Regtest result link .>