Object detection is an important computer vision task, which is developed from image classification task. The difference is that it is no longer only to classify a single type of object in an image, but to complete the classification and positioning of multiple objects that may exist in an image at the same time. Classification refers to assigning category labels to the object, and positioning refers to determining the vertex coordinates of the peripheral rectangular box of the object. Therefore, object detection is more challenging and has broader application prospects, such as automatic driving, face recognition, pedestrian detection, medical detection, etc. Object detection can also be used as the research basis for more complex computer vision task such as image segmentation, image description, object tracking and action recognition. In traditional object detection, the feature utilization rate is low and it is easy to be affected by other environmental factors. Hence, this paper proposes a multimodal deep learning-based feature fusion for object detection in remote sensing images. In the new model, cascade RCNN is the backbone network. Parallel cascade RCNN network is utilized for feature fusion to enhance feature expression ability. In order to solve the problem of different segmentation shapes and sizes, the central part of the network adopts multi-coefficient cascaded hollow convolution to obtain multi-receptive field features without using pooling mode and preserving image information. Meanwhile, an improved selfattention combined receptive field strategy is used to obtain both low-level features with marginal details and high-level features with global semantics. Finally, we conduct experiments on DOTA set including ablation experiments and comparison experiments. The experimental results show that the mean Average Precision (mAP) and other indexes have been greatly improved, and its performance is better than the state-of-the-art detection algorithms. It has a good application prospect in the remote sensing image object detection task.