Measures of String Similarities Based on the Hamming Distance


Bojan Nikolić, Boris Šobot




We consider measures of similarity between two sets of strings built up using the Hamming distance and tools of persistence homology as a basis. First we describe the construction of the Čech filtration adjoined to the set of strings, the persistence module corresponding to this filtration and its barcode structure. Using this means, we introduce a novel similarity measure for two sets of strings, based on a comparison of bars within their barcodes of the same dimension. Our idea is to look for a comparison that will take under consideration not only the overlap of bars, but also ensure that observed bars are qualitatively matched, in the sense that they represent similar homological features. To make this idea happen, we developed a method called the separation of simplex radii technique.