This master’s thesis explores weakly supervised point cloud semantic segmentation by integrating information from both 2D images and 3D point clouds. The approach leverages bidirectional feature interaction between modalities, overcoming challenges of sparse annotations by incorporating supervisory signals through oversegmentation. To mitigate label noise, a novel noise-robust framework is introduced, integrating robust loss functions and innovative loss adjustment strategies. The proposed approach, validated on benchmark datasets like ScanNetV2 and 2D-3D-S, demonstrates superior quantitative and qualitative performance, showcasing its efficacy in enhancing weakly supervised point cloud semantic segmentation.