We have successfully implemented a Venus library that can load .mgk models without symbol errors. The model now crashes when trying to execute because all implementations are stubs. This document outlines the steps to implement real inference execution.
Get the AEC_T41_16K_NS_OUT_UC.mgk model to successfully execute inference on the Ingenic T41 NNA hardware accelerator.
File: thingino-accel/src/device.c (in nna_init())
Currently __oram_vbase is NULL. Need to:
// After opening /dev/soc-nna, map ORAM physical address
int mem_fd = open("/dev/mem", O_RDWR | O_SYNC);
void *oram_virt = mmap(NULL, ORAM_SIZE, PROT_READ | PROT_WRITE,
MAP_SHARED, mem_fd, ORAM_PHYS_ADDR);
__oram_vbase = oram_virt;Reference: See how we map DMA memory in nna_malloc() in src/memory.c
File: thingino-accel/src/device.c
Need to map:
__nndma_io_vbase- NNA DMA I/O registers (check kernel driver for physical address)__nndma_desram_vbase- NNA DMA descriptor RAM
Reference: ingenic-sdk-matteius/4.4/misc/soc-nna/soc_nna_main.c for register addresses
File: thingino-accel/src/runtime.c
The __ddr_vbase and __ddr_pbase should point to the DMA memory region allocated by the kernel driver.
Approach:
- Allocate a large DMA buffer (e.g., 16MB) during
nna_init() - Set
__ddr_pbaseto the physical address returned by kernel - Set
__ddr_vbaseto the mmap'd virtual address
File: thingino-accel/src/venus/tensor.h
Currently it's just a forward declaration. Need to implement:
class TensorXWrapper {
public:
TensorX *tensor;
void *data;
size_t size;
bool owns_memory;
TensorXWrapper(TensorX *t);
~TensorXWrapper();
void allocate_memory();
void free_memory();
};File: thingino-accel/src/venus/tensor.cpp
Implement constructor, destructor, and memory management methods.
Critical: Memory should be allocated using nna_malloc() to get DMA-capable memory.
File: thingino-accel/src/venus/magik_model.cpp (in constructor)
The MagikModelBase constructor receives parameters that likely contain:
param1,param2- Model configuration (dimensions, layer count, etc.)param3(void*&) - Reference to a pointer, likely for returning allocated memoryparam4- Additional configuration
Approach:
- Study the .mgk ELF file structure to understand what metadata is available
- Parse ELF sections to extract input/output tensor shapes
- Create TensorXWrapper objects for inputs and outputs
Reference: ingenic-sdk-matteius/magik-nna/parse_mgk_model.py shows ELF structure
File: thingino-accel/src/venus/magik_model.cpp
Currently returns empty strings. Should return comma-separated tensor names.
Approach: Parse tensor names from .mgk ELF symbols or metadata sections.
File: thingino-accel/src/venus/magik_model.cpp
Currently returns nullptr. Should:
- Maintain a map of tensor name -> TensorXWrapper*
- Look up and return the wrapper for the requested name
File: thingino-accel/src/venus/magik_model.cpp
The model needs workspace memory for intermediate computations.
size_t MagikModelBase::get_forward_memory_size() const {
// Parse from model metadata or use default
return forward_memory_size;
}Allocate this memory in the constructor using nna_malloc().
File: thingino-accel/src/venus/magik_model.cpp
This is the main inference execution method. Steps:
- Validate input tensors are set
- Set up NNA DMA descriptors
- Program NNA hardware registers
- Trigger execution via IOCTL
- Wait for completion
- Return results
Reference:
ingenic-sdk-matteius/magik-nna/nna_test.cfor example usageingenic-sdk-matteius/4.4/misc/soc-nna/soc_nna.hfor IOCTL commands
File: thingino-accel/src/venus/magik_model.cpp or new file
The NNA uses DMA descriptors to transfer data. Need to:
- Create descriptor structures
- Set up read channel (input data)
- Set up write channel (output data)
- Use
IOCTL_SETUP_DESto configure kernel driver
IOCTL Command: 0xc0046303 (SETUP_DES)
File: thingino-accel/src/venus/magik_model.cpp
Use IOCTL commands to start NNA:
IOCTL_RDCH_START(0xc0046304) - Start read channelIOCTL_WRCH_START(0xc0046305) - Start write channel
Wait for completion (polling or interrupt-based).
File: thingino-accel/examples/test_inference.c
Create a test that:
- Loads the AEC model
- Sets dummy input data (zeros or random)
- Runs inference
- Prints output tensor values
Add verbose logging to track:
- Memory allocations
- DMA descriptor setup
- IOCTL calls and return values
- NNA register states
If possible, compare results with the original OEM library to verify correctness.
# Build
cd thingino-accel && make clean && make
# Upload to camera
sshpass -p "Ami23plop" scp -O build/lib/libvenus.so build/lib/libnna.so \
build/bin/test_model_load root@192.168.50.117:/tmp/
# Test
sshpass -p "Ami23plop" ssh root@192.168.50.117 \
"LD_PRELOAD=/tmp/libvenus.so LD_LIBRARY_PATH=/tmp /tmp/test_model_load /tmp/AEC_T41_16K_NS_OUT_UC.mgk"- Memory Safety: All DMA memory must be allocated via
nna_malloc()to ensure physical contiguity - Cache Coherency: Use
IOCTL_FLUSHCACHEbefore/after DMA operations - Error Handling: Check all IOCTL return values and handle errors gracefully
- Hardware State: Ensure NNA is properly initialized before use
After completing these tasks, the test should:
- ✅ Load the model successfully (already working)
- ✅ Initialize all hardware mappings
- ✅ Allocate tensors and forward memory
- ✅ Execute inference without crashing
- ✅ Return valid output tensor data
- Kernel driver:
ingenic-sdk-matteius/4.4/misc/soc-nna/ - Test examples:
ingenic-sdk-matteius/magik-nna/nna_test.c - Venus headers:
magik-toolkit/InferenceKit/nna2/mips720-glibc229/T41/include/ - Current status:
thingino-accel/VENUS_IMPLEMENTATION_STATUS.md