VinRobotics – Where robotics technology is created with the goal of serving and improving human life.

About Us Products News Careers

Home

Career

Junior/Senior Computer Vision & Visual SLAM Engineer

Hanoi / Ho Chi Minh City
Fulltime

Role Overview

VinRobotics is seeking Computer Vision Engineers to join our Humanoid Robotics Perception Team. You will design and deploy real-time perception systems enabling humanoid robots to see, localize, understand, and navigate complex real-world environments. This role focuses on 4 core perception pillars:

6-DoF Object Pose Estimation for manipulation
Stereo Depth Estimation for geometry understanding
Semantic Segmentation & Mapping for navigation and scene understanding
Visual SLAM Your work will directly power grasping, manipulation, obstacle avoidance, semantic mapping, visual SLAM, and autonomous navigation on next-generation robotic platforms.

Key Responsibilities

6-DoF Object Pose Estimation (Manipulation Perception)

Design and implement 6D object pose estimation pipelines using RGB, RGB-D, or stereo inputs -
Handle occlusion, symmetry, cluttered scenes, and domain shift
Integrate pose outputs with grasp planning and manipulation stacks
Optimize inference pipelines for real-time robotic execution

Stereo Depth Estimation

Develop and optimize stereo depth estimation pipelines
Handle challenging conditions:
1. Low texture
2. Reflective / transparent surfaces
3. Outdoor / indoor lighting variation

Evaluate depth accuracy, completeness, and latency under real robotic constraints

Semantic Segmentation & Navigation Perception

Build semantic segmentation models for:
1. Travers ability
2. Obstacle classification
3. Scene understanding (floor, walls, objects, humans, dynamic agents)

Contribute to semantic 3D mapping and semantic SLAM pipelines
Support downstream modules such as:
1. Local planning
2. Obstacle avoidance
3. Global navigation and relocalization

Sensor Fusion & System Integration

Develop multi-camera perception systems (RGB, stereo, RGB-D)
Integrate perception modules with ROS 2 Humble and real robot stack
Collaborate with SLAM, control, and motion planning teams (MoveIt, Nav2)
Ensure robust synchronization, calibration, and frame alignment

Visual SLAM

Design, implement, and optimize Visual / Visual-Inertial SLAM pipelines for real-time robot localization and mapping.
Integrate loop closure and place recognition to ensure long-term localization consistency.
Fuse multi-sensor data (RGB, stereo, RGB-D, IMU) for improved accuracy and robustness.
Optimize SLAM systems for low latency, high reliability, and real-time deployment on robotic platforms.

Technical Requirements

Core Skills (Required)

Strong background in Computer Vision, Robotics, or Deep Learning
Solid understanding of multi-view geometry, epipolar geometry, and camera models
Hands-on experience with deep learning frameworks: o PyTorch / Tensorflow o ONNX / TensorRT o CUDA (deployment & optimization)
Experience with 3D data processing: o Open3D, PCL o NumPy, PyTorch3D
Proficiency in Python and/or C++ on Linux Robotics & System Experience
ROS 2
Experience with camera drivers & sensors: o Intel RealSense o ZED (stereo & RGB-D)
Familiarity with MoveIt / Nav2 / robotic execution pipelines

Preferred Qualifications

Bachelor’s or Master’s degree in Computer Vision, Robotics, AI, or related fields
Experience in one or more of the following:
1. Multi-view perception
2. Visual SLAM / Visual-Inertial systems
3. Robot grasp learning
4. Semantic mapping or navigation perception
5. GPU optimization
6. Distributed training
7. Synthetic-to-real domain adaptation

What We Offer

Work on cutting-edge humanoid and autonomous robotics systems
Real-world deployment on state-of-the-art robotic hardware
Collaborative environment with AI researchers, roboticists, and system engineers
Access to GPU clusters, simulation environments, and large-scale datasets
Competitive compensation, benefits, and career growth opportunities