Effective emotion recognition based on electroencephalography (EEG) is crucial for the development of Brain-Computer Interface (BCI). Neuroscientific studies highlight the importance of localized brain activity analysis for understanding emotional states. However, existing deep learning methods often fail to extract spatio-temporal features of EEG signals adequately. Accordingly, we propose a novel spatio-temporal graph neural network, MSL-TGNN, by integrating local and global brain information. A multi-scale temporal learner is designed to extract EEG temporal dependencies. And a brain region learning block and an extended global graph attention network are introduced to explore the spatial features. Specifically, the brain region learning block aggregates local channel information, whereas the extended global graph attention network can effectively capture nonlinear dependencies among regions to extract global brain information. We conducted subject-dependent and subject-independent experiments on the DEAP dataset, and the results indicate that our proposed model outperforms compared to state-of-the-art methods.