Video preserves valuable raw information. Understanding these data and then recognizing objects and tagging them are crucial to intelligent planning and decision making. Deep learning provides us an effective way to understand big data with a human-level. As traffic video is characterized by crowded scene and low definition, it will be non-effective to deal with the whole image once. An alternative way is to separate image and determine a small window for each moving object. A Q-learning based moving object recognition approach, which firstly finds out moving object region and then uses a Q-learning based optimization method to determine the most compact region that contain the moving object, is proposed. The algorithms enable to detect the most compact rectangle around the moving object at near real-time speed. After that, a deep neural network is used to semantic tag the recognized objects. The experiment results show the algorithms work effectively.