Junior/Senior Computer Vision & Visual SLAM Engineer
- Hanoi / Ho Chi Minh City
- Fulltime
Role Overview
VinRobotics is seeking Computer Vision Engineers to join our Humanoid Robotics Perception Team. You will design and deploy real-time perception systems enabling humanoid robots to see, localize, understand, and navigate complex real-world environments. This role focuses on 4 core perception pillars:
6-DoF Object Pose Estimation for manipulation
Stereo Depth Estimation for geometry understanding
Semantic Segmentation & Mapping for navigation and scene understanding
Visual SLAM Your work will directly power grasping, manipulation, obstacle avoidance, semantic mapping, visual SLAM, and autonomous navigation on next-generation robotic platforms.
Key Responsibilities
6-DoF Object Pose Estimation (Manipulation Perception)
Design and implement 6D object pose estimation pipelines using RGB, RGB-D, or stereo inputs -
Handle occlusion, symmetry, cluttered scenes, and domain shift
Integrate pose outputs with grasp planning and manipulation stacks
Optimize inference pipelines for real-time robotic execution
Stereo Depth Estimation
Develop and optimize stereo depth estimation pipelines
Handle challenging conditions:
Low texture
Reflective / transparent surfaces
Outdoor / indoor lighting variation
Evaluate depth accuracy, completeness, and latency under real robotic constraints
Semantic Segmentation & Navigation Perception
Build semantic segmentation models for:
Travers ability
Obstacle classification
Scene understanding (floor, walls, objects, humans, dynamic agents)
Contribute to semantic 3D mapping and semantic SLAM pipelines
Support downstream modules such as:
Local planning
Obstacle avoidance
Global navigation and relocalization
Sensor Fusion & System Integration
Develop multi-camera perception systems (RGB, stereo, RGB-D)
Integrate perception modules with ROS 2 Humble and real robot stack
Collaborate with SLAM, control, and motion planning teams (MoveIt, Nav2)
Ensure robust synchronization, calibration, and frame alignment
Visual SLAM
Design, implement, and optimize Visual / Visual-Inertial SLAM pipelines for real-time robot localization and mapping.
Integrate loop closure and place recognition to ensure long-term localization consistency.
Fuse multi-sensor data (RGB, stereo, RGB-D, IMU) for improved accuracy and robustness.
Optimize SLAM systems for low latency, high reliability, and real-time deployment on robotic platforms.
Technical Requirements
Core Skills (Required)
Strong background in Computer Vision, Robotics, or Deep Learning
Solid understanding of multi-view geometry, epipolar geometry, and camera models
Hands-on experience with deep learning frameworks: o PyTorch / Tensorflow o ONNX / TensorRT o CUDA (deployment & optimization)
Experience with 3D data processing: o Open3D, PCL o NumPy, PyTorch3D
Proficiency in Python and/or C++ on Linux Robotics & System Experience
ROS 2
Experience with camera drivers & sensors: o Intel RealSense o ZED (stereo & RGB-D)
Familiarity with MoveIt / Nav2 / robotic execution pipelines
Preferred Qualifications
Bachelor’s or Master’s degree in Computer Vision, Robotics, AI, or related fields
Experience in one or more of the following:
Multi-view perception
Visual SLAM / Visual-Inertial systems
Robot grasp learning
Semantic mapping or navigation perception
GPU optimization
Distributed training
Synthetic-to-real domain adaptation
What We Offer
Work on cutting-edge humanoid and autonomous robotics systems
Real-world deployment on state-of-the-art robotic hardware
Collaborative environment with AI researchers, roboticists, and system engineers
Access to GPU clusters, simulation environments, and large-scale datasets
Competitive compensation, benefits, and career growth opportunities
Similar job opportunities
Intern Autonomy Engineer
- Fulltime
- Due date
Agentic AI Engineer
- Fulltime
- Due date
Machine Learning Scientist – Vision-Language-Action (VLA) for Humanoids (Junior/Senior)
- Fulltime
- Due date
