假日嵌入式系統工程師：SEGGER RTT (3)

專業的Trace系統

在前面的文章[1][2]，我介紹怎麼用SEGGER RTT搭配printf()。這個方法有幾個明顯的缺點：

printf()需要Data memory存format字串，在SRAM受限的場合會很傷腦筋
printf()需要花時間產生字串
- 舉例來說，產生十進位數值(%d)一定不是瑣碎的工作

回頭看SEGGER RTT，既然已經有路能送字串，為什麼不直接送binary values，由PC產生字串就好？以下面這段code為例：

printf("idx1=%d, idx2=%d\n", idx1, idx2);

上面的code，format string起碼要 >16 bytes，而且MCU要花cycles來產生字串，最後字串也要 >16 bytes；還不包括code size的開銷。如果丟binary values就可以只傳必要的資訊 (12 bytes)

ID碼，表示format string是哪一個
- ID=1 -> “idx1=%d, idx2=%d\n”
參數
- idx1
- idx2

這篇文章介紹怎麼在MCU/PC實作這個idea。另外我認為SEGGER SystemView實作的方法也是同一招，底層都是RTT，只是PC端的程式換另外一套。

MCU-Side 程式

和所有通訊系統一樣，發送端比較簡單。我們定義一道macro來產生ID，裡面包括magic number來讓PC偵測trace起點，以及trace-ID本體。程式的主體是一個loop，透過SEGGER_RTT_Write不停送出 {ID, i, i<<16}。如果接收端能正確解出來，那我們就成功了。

#define MK_TRACE_HEADER(id)     ((0xABCD<<16) | ((id)<<0))

int main(void)
{
    unsigned int buf_idx = 0;

    while(1) {
        unsigned int i;
        for(i=0; i<256; i++) {
            unsigned int log_buf[3] = {
                MK_TRACE_HEADER(0x1),
                i,
                i<<16,
            };
            
            SEGGER_RTT_Write(buf_idx, log_buf, sizeof(log_buf));
            nrf_delay_ms(100);
        }
    }
}

PC-Side 程式

上篇文章用的telnetlib在這裡行不通，因為我試過。我發現byte value如果是0/255會被telnetlib過濾（也許是telnet protocol）。所以我往底層走，直接用SOCKS API：

import socket
import array
import struct

def socket_version():
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect(('127.0.0.1', 19021))

    while True:
        incoming_data = sock.recv(1024)
        if incoming_data != None:
            trace_parsing(incoming_data)
            
def main():
    os.system('taskkill -im jlink.exe')
    os.system(r'start "" "c:\Program Files (x86)\SEGGER\JLink_V510i\JLink.exe" -device nrf51822 -if swd -speed 4000 -autoconnect 1')
    time.sleep(1)

    socket_version()

跑JLINK.exe之後等1秒確定Telnet Server有起來，再進入socket_version()連上JLink。SOCKS library連線還滿容易的，裡面塞一個迴圈不斷抽資料往trace_parsing()送：

def trace_data_lookup(trace_id):
    trace_dict = [
        {'len': 1}, # 0x0000
        {'len': 2}, # 0x0001
    ]
    return trace_dict[trace_id]['len']

def trace_parsing(incoming_data):
    # Incoming data plus existing buffer
    buf = trace_parsing.buf + incoming_data

    while True:
        # Check whether there's sufficient data in buf
        if len(buf) < 4:
            break

        # Get the control word from buffer
        ctrl_word = struct.unpack('I', buf[0:4])[0]
        magic_word = (ctrl_word >> 16) & 0xFFFF  # bit[31:16]
        trace_id =   (ctrl_word >>  0) & 0xFFFF  # bit[15:0]

        # Magic word should be 0xABCD, if not matched, omit 1byte here (because the RTT buffer is not 4 byte sync)
        if magic_word != 0xABCD:
            buf = buf[1:]
            continue
        
        # Check whether there's sufficient words in buf (4byte_header + num_word)
        trace_len = trace_data_lookup(trace_id)
        packet_len = 4 + trace_len*4
        if len(buf) >= packet_len:
            trace_body = buf[4:packet_len]
            emit_trace(trace_id, trace_body)
            buf = buf[packet_len:]
        else:
            break

    # Keep uncompleted buffer to static buf
    trace_parsing.buf = buf

trace_parsing.buf = ''

上面這段code可能要花點時間看一下，原始碼就是說明書咩。
上面function的trace_parsing.buf很像C的static variable，紀錄哪些資料還沒分析過。因為我們的trace header是4 bytes，所以第一步要先看buffer size有沒有超過4，有才繼續做。
解析header沒那麼單純，因為接收端看到的byte[0]可能和傳送端不同步，也可能有0/1/2/3 bytes offset。所以用Python struct library解析的header，要先檢察bit[31:16]是0xABCD才是合法的header。如果magic number錯誤就丟掉一個byte，然後再解析header。
解到正確的header以後，就拿得到trace-ID，然後就能用trace_data_look()看看長度。如果buffer的資料夠了，就呼叫emit_trace()產生字串：

def emit_trace(trace_id, trace_body):
    trace_len = trace_data_lookup(trace_id)
    struct_fmt = '%dI' % trace_len
    args = struct.unpack(struct_fmt, trace_body)

    if trace_id == 0x0000:
        print('[TRC 0x0001] arg0=%08X, arg1=%08X' % args)

    else:
        print('[TRC #%d]' % trace_id),
        for i, arg in enumerate(args):
            print('arg%d=0x%08X, ' % (i, arg)),
        print('')

這個function就單純得多，看trace ID是多少就印對應的字串。裡面要來點bit-unpack，寫檔，做點後處理都是小事。這些東西在PC做會遠比在MCU做有效率（而且省code size）。底下是整個程式真的跑起來的樣子，丟出來的arg0/ arg1就真的是 (i, i<<16) 的關係！！

結論

這邊提出的框架非常簡單易用，想做什麼後處理都可以。比如想收集滿一些資料即時畫FFT plot也都沒問題。我覺得這個是我最終滿意的用法。SEGGER SystemView也是用同個思路，做出RTOS online monitor。

Pro
- 省code size
- 省data memory size
- 省MCU-cycles
- 容易做後處理
- 這些code可以嵌在程式裡，日後做分析都可以
Cons
- Trace的程式和分析的程式分散在兩個地方，比較難coding & maintain
- 需要一顆~~貴森森~~的SEGGER JLINK才能用這招 (UART版只要一顆便宜的FTDI晶片)

Reference

[1]: SEGGER-RTT (1), http://lihgong.blogspot.com/2016/04/segger-rtt-1.html
[2]: SEGGER-RTT (2), http://lihgong.blogspot.com/2016/04/segger-rtt-2.html
[3]: SEGGER’s RTT introduction website, https://www.segger.com/jlink-rtt.html

搜尋此網誌

號呆鵝的操盤室

文章總列表