Abstract

Deep extreme feature extraction: new MVA method for searching particles in high energy physics

Chao Ma, Jinhui Xu, Tiancheng Hou, Bin Lan, Zhenhua Zhang

In this paper, we propose Deep Extreme Feature Extraction (DEFE), a new ensemble MVA method for searching τ + τ − channel of Higgs bosons in high energy physics. DEFE can be viewed as a deep ensemble learning scheme that trains a strongly diverse set of neural feature learners without explicitly encouraging diversity and penalizing correlations, which is achieved by adopting an implicit neural controller (not involved in feedforward computation) that directly controls and distributes gradient flows from higher level deep prediction network. Such model-independent controller results in that every single local feature learned are used in the feature-to-output mapping stage, avoiding the blind averaging of features. DEFE makes the ensembles 'deep' in the sense that it allows deep post-process of these features that try to learn to select and abstract the ensemble of neural feature learners. Based the construction and approximation of the so-called extreme selection region, the DEFE model is able to be trained efficiently, and extract discriminative features from multiple angles and dimensions, hence the improvement of the selection region of searching new particles in HEP can be achieved. With the application of this model, a selection region full of signal processes can be obtained through the training of miniature collision events set. In comparison with the Classic Deep Neural Network, DEFE shows a state-of-the-art performance: the error rate has decreased by about 37%, the accuracy has broken through 90% for the first time, along with the discovery significance has reached a standard deviation of 6.0 σ. Experimental data shows that DEFE is able to train an ensemble of discriminative feature learners that boosts the overperformance of final prediction. Furthermore, among high-level features, there are still some important patterns that are unidentified by DNN and are independent of low-level features, while DEFE is able to identify these significant patterns more efficiently,