The last decade has seen an increasing demand from the industrial field of computerized visual inspection. Applications rapidly become more complex and often with more demanding real time constraints. However, from 2004 onwards the clock frequency of CPUs has not increased significantly. Computer Vision applications have an increasing demand for more processing power but are limited by the performance capabilities of sequential processor architectures. The only way to get more performance using commodity hardware, like multi-core processors and graphics cards, is to go for parallel programming. This article focuses on the practical question: How can the processing time for vision algorithms be improved, by parallelization, in an economical way and execute them on multiple platforms?