In the crowd congestion in smart cities, the people flow statistics is necessary in public areas to reasonably control people flow. The You Only Look Once-v3 (YOLOv3) algorithm is employed for pedestrian detection, and the Smooth_L1 loss function is introduced to update the backpropagation parameters to ensure the stability of the object detection model. After the pedestrian is detected, tracking the pedestrian for a certain time is necessary to count out the specific number of pedestrians entering and leaving. Specifically, the Mean Shift algorithm is combined with the Kalman filter to track the target. When the target is lost, the Mean Shift algorithm is used for iterative tracking, and then the Kalman prediction is updated. In the experiment, 7,000 original images are collected from the library, mentioning 88 people of which 82 are recognized, and the detection accuracy reaches 93.18%. The 12,200 original images collected in the teaching building include149 people, of which 139 are recognized, with the detection accuracy reaching 93.29%. Therefore, the people flow statistics system based on machine vision and deep learning can detect and track pedestrians effectively, which is of great significance for the people flow statistics in public areas in smart cities and for the smooth development of various activities.