Most research efforts on image classification so far have been focused on medium-scale datasets. In addition, there exist other problems, such as difficulty in feature extraction and small sample size. In order to address above difficulties, this paper proposes a multi-perspective convolutional neural network model, which contains channel attention module and spatial attention module. The proposed modules derive attention graphs from channel dimension and spatial dimension respectively, then the input features are selectively learned according to the importance of the features. We explain how the gain in storage can be traded against a loss in accuracy and/or an increase in CPU cost. In addition, we give the interpretability of the model at multiple scales. Quantitative and qualitative experimental results demonstrate that the accuracy of our proposed model can be improved by up to 3.8% and outperforms the state-of-the-art methods.