The missions at sea require automation due to human accessibility and labour constraints. Accordingly, the requirement for a USV is highlighted for surveillance, environment investigation, and so on. The fully automated USV requires the reliable detection system in accordance with the prerequisite to safe collision avoidance. For this, USVs are equipped with a number of equipment, but these units are expensive and demand extra loading capacity. Therefore, it is necessary to simplify such equipment, and at the same time, essential data for safe collision avoidance should be acquired without loss.The equipment simplification potentially can be achieved by using a vision sensor. The vision sensor that has been used in the conventional USV only tracks the marine object. For safe collision avoidance, the type of object detected and the distance to the object are also required. This additional information requires direct observation from human or other equipment support. If the vision sensor can be used to estimate the distance and the object type, the equipment for USV can be simplified.The purpose of this research is the development of vision-based object detection algorithm that recognises a marine object and estimates the position and distance to the object for USV. Faster R-CNN, a state-of-the-art image processing technique that imitates human visual perception, is used to recognise and localise object on a captured frame from a vision sensor. In order to obtain the distance to the recognised object, stereo vision based depth estimation technique is used.Therefore, a stereo camera was used in this research. By combining these two techniques, real-time marine object detection algorithm was implemented and the performance of this algorithm is verified by model ship detection test in towing tank. The test results showed that this algorithm is potentially applicable to real USV.