In college I had access to amazing programming tools. We got to use any version of visual studio, alongside integrated intel optimization tools, and CUDA integrations. One of the projects I used this for was developing a massively parallel method to extract nonlinear spiking signals from set of timeseries. The application this was targeted towards were brain compute interfaces where selecting only nonlinear spikes is a very good compression method. At the time of programming this compression would have worked better if it were implemented in logic gates on the SOC which acquires the full timeseries. I was also working on doing this on Intan SOCs (which use FPGAs so they can be programmed easily). However now that there are GPUs on SOCs and GPUs are starting to take advantage of networking maybe the time for a polished version of this library is nearing
Here's an example of how the performance of this spike detection scaled compared to the CPU only program. Note that processing time refers to the time it took to process 1 second of data. So any processing time greater than 1 second would mean this could not be used in a real time system. This means that even though the parallel code only runs about 15x faster than the serial code it can handle 32x the amount of data the serial code can handle. (The different lines are just different test runs.
If you're learning more about how to design these parallel algorithms a great starting place is working on optimizing a couple serial algorithms. My post Cheatsheet for basic serial optimization contains some of the takeaways from my concurrent programming class in college. These can help a lot, like one time I made a Matlab program run in 3 hours instead of 80ish hours just by optimizing the serial program with a couple tricks. Once you've practiced serial optimization there are many great resources for practicing parallelizing code. I recommend trying to parallelize image filters. Also I haven't read them but I've been told NVIDIA's GPU gem books are pretty good.
Comments