Edge Optimization Engineer (Junior/Senior)
- Hanoi / Ho Chi Minh City
- Fulltime
Overview
You’ll own end-to-end performance on the robot edge: building deep-dive profilers, pinpointing kernel/op/memory bottlenecks across the full data path (model -> dataloader/IPC -> I/O -> control loops), and shipping deterministic builds that stay rock-solid under real duty cycles (thermal, brownouts, EMI). Your scope spans GPU/NPU edge targets (e.g., Nvidia Jetson Orin, TI EVMs, etc.) with clear latency budgets per subsystem (perception/VLA, policy/RL, control).
Key Responsibilities
Own end-to-end profiling. Run Nsight/PyTorch Profiler/tegrastats traces across the full perception -> policy ->control path; attribute stalls to specific ops/kernels, memory moves, cache misses, or IPC.
Set and enforce budgets. Define per-module latency/throughput/energy targets (e.g., camera→VLA ≤X ms, policy step ≤Y ms, control loop 1 kHz) and guard them with automated checks.
Kill bottlenecks fast. Apply op fusion, graph capture, async pipelines, memory tiling, pinned buffers, and stream concurrency; rewrite hot paths in CUDA/Triton when needed.
Tune per device. Generate deterministic builds for edge devices; measure kernel occupancy, tensor-core usage, DMA overlap; lock clocks when appropriate.
Package optimized artifacts. Emit TensorRT engines/ONNX bundles, quantized weights, calibration sets, and config manifests; version everything for rollback.
On-call for performance. Triage field logs, root-cause perf regressions, and ship hotfix engines without violating safety or RT guarantees.
Continuously raise the bar. Track new compiler/runtime features (TensorRT/TVM/Inductor), add them behind flags, and land wins once validated against budgets.
Required Qualifications
Profiling & diagnosis. Expert with Nsight, PyTorch Profiler, tegrastats; you attribute latency/throughput regressions to exact kernels/ops and memory traffic, then present an ROI-clear fix plan.
Real-time mindset. You set/meet per-module latency budgets and protect hard real-time control paths during optimization and deployment.
Deployment engineering. Containerized, deterministic builds that emit TRT engines/quantized weights; repeatable, one-command device bring-up.
Hardware breadth. Hands-on with Nvidia Jetson Orin, TI EVMs, etc.; comfortable validating under thermal/brownout/EMI stress so behavior stays stable.
Preferred Qualifications
Compiler/runtime chops (TensorRT, ONNX Runtime, TVM, Triton, Executorch) and op-fusion/graph-capture to reduce stalls and memory traffic. (Aligns with packaging TRT engines & deterministic pipelines.)
VLA/RL awareness to set subsystem budgets and validate perception-to-action latency end-to-end.
Cơ hội việc làm tương tự
Model Optimization Engineer (Junior/Senior)
- Fulltime
- Hạn nộp
Senior Robotics Control Engineer
- Fulltime
- Hạn nộp
Junior Robotics Control Engineer
- Fulltime
- Hạn nộp
