What's Next for Signal Processing? Part 1

15 April 2015
Signal Processing

The other day, I was swapping emails with one of my colleagues on a variety of subjects. After a small foray into a discussion of 1980s British sitcoms, he asked me (perhaps prompted by our recent press release): “So—what are future signal processing systems going to look like?” Aside from the apparent strangeness of the juxtaposition of the subjects (not quite so odd, really, but not relevant here; I’ll explain if you care to ask), when I thought about it, it is an interesting question—particularly if you extend the time horizon a little.

The obvious stuff

In the short- to medium-term, it doesn’t take a crystal ball to predict the trajectory. We are firmly in a heterogeneous, multicore world and will be there for some time to come. Conventional CPUs have ceased to get faster, but continue to go wider. Clock rates have stalled around the 3.something GHz range, and to make up for that, we see more and more cores being instantiated.

Quad core is the new baseline, with 8, 12, 16 and 18 cores per “socket” being commonplace. The higher core counts previously in the domain of server class devices that were not commonly used in rugged, embedded designs are starting to become available in the Ball Grid Array packages favored by designers striving to meet harsh shock and vibration parameters. For example, see Intel’s recently-announced Xeon-D System on Chip and Freescale’s T4240

By “going wider,” we generally mean that more things happen per clock cycle. Adding cores and improving support for multiple threads in flight concurrently are part of that. In addition, extra execution units for specific purposes get bolted on to accelerate certain workloads. Engines for video encode and decode, cryptography and pattern matching are pretty common. Vector engines are of most interest for signal processing, and are well served by AVX and Altivec, with AVX currently at 256 bits wide (or eight single precision floating point values) on most CPUs. 

Getting wider

A 512-bit version of AVX is coming up for the mainstream, starting with some Skylake variants in the 2016 timeframe and is already here on Xeon Phi if you can afford the power budget. By combining this pipeline with parallel execution units, you can see 32 operations in flight at the same time versus one or two for the core processing unit. 

AVX-1024 is probably coming at some point too. The API already contains support for this, although no hardware announcements have been made public as yet. Wider indeed.

GPUs continue to evolve. Again, each iteration increases the number of cores and improves the scheduling. Discrete GPUs are emerging with faster interconnects both between GPUs (and in some cases between CPU and GPU) and between GPU and bulk memory. Integrated GPUs improve with each CPU generation, and while the number of cores does not approach that available on discretes, the close coupling has some inherent advantages.

FPGAs are not being left behind. More gates and more real estate dedicated to fixed functions like floating point arithmetic are pretty much expected these days. Combine that with getting the devices onto the same node size as CPUs and a renewed dedication to programmability, and their use continues to be interesting—especially when performance per Watt is a key metric (and in our little world, when isn’t it?). 

CPUs, GPUs and FPGAs all have their strengths and weaknesses and are frequently used to complement each other in heterogeneous systems that pick the best of breed for different parts of the processing chain. A typical next generation radar might combine FPGAs for data ingest, channelization and filtering before passing the data to a backend of CPUs, possibly with GPUs being used to accelerate certain modes. OpenCL is getting to the point where it is a serious contender to program all three types of devices, allowing the system designer unprecedented flexibility in partitioning processing tasks across the nodes.

In part 2 of this post, I’ll be looking at what might come next as things start to get weird… I’ll be back in two weeks. Stay tuned. 

Peter Thompson

Peter Thompson is Vice President, Product Management at Abaco Systems. He first started working on High Performance Embedded Computing systems when a 1 MFLOP machine was enough to give him a hernia while carrying it from the parking lot to a customer’s lab. He is now very happy to have 27,000 times more compute power in his phone, which weighs considerably less.