|Title||Thread-cooperative, bit-parallel computation of Levenshtein distance on GPU|
|Publication Type||Conference Paper|
|Year of Publication||2014|
|Authors||Chacón, A, Marco, S, Espinosa, A, Ribeca, P, Moure, JC|
|Conference Name||ICS Congress 2014, Munich|
Approximate string matching is a very important problem in computational biology; it requires the fast computation of string distance as one of its essential components. Myers' bit-parallel algorithm improves the classical dynamic programming approach to Levenshtein distance computation, and offers competitive performance on CPUs. The main challenge when designing an efficient GPU implementation is to expose enough SIMD parallelism while at the same time keeping a relatively small working set for each thread. In this work we implement and optimise a CUDA version of Myers' algorithm suitable to be used as a building block for DNA sequence alignment.We achieve high efficiency by means of a cooperative parallelisation strategy for (1) very-long integer addition and shift operations, and (2) several simultaneous pattern matching tasks. In addition, we explore the performance impact obtained when using features specific to the Kepler architecture. Our results show an overall performance of the order of tera cells updates per second using a single high-end Nvidia GPU, and factor speedups in excess of 300x with respect to a single-core, non-vectorised CPU implementation.