We describe a method for learning and recognizing windows as basic structural elements of façades and organizing them into interpretable models of building façades. The method segments an input image into a hierarchical structure of window candidates. The candidates are used to create a likelihood map of window locations that is explained by a structural façade model based on a formal grammar. We use a look-ahead greedy search method in the grammar derivation space to select the (sub)optimal façade model. Empirical evaluation results reveal that, on average, the generated façade model covers 45% of the actual windows present in the input image. On the other hand, 56% of the modeled windows actually cover façade windows present in the input image. façade segmentation, window detection, formal grammar, urban environment, image segmentation.