To retrieve voice information in a fast and accurate manner over encrypted speech, this study proposes a retrieval algorithm based on syllable-level perceptual hashing. It implements the function of retrieving speech segment and spoken term over encrypted speech database. Before uploading the speech to the cloud, it needs to embed the digital watermarks (perceptual hashing). In the retrieval process, it does not need search over encrypted speech data directly or decryption, but requires searching the system hash table. Experimental results show that the syllable-level perceptual hashing of the proposed scheme has good discrimination, uniqueness, and perceptual robustness to common speech. In addition, the proposed retrieval algorithm effectively improves the retrieval speed by reducing the matching number of query index. The precision ratio and recall ratio all achieve high under various signal processing.