External Graphics Over PCIe 3.0? Netstor's NA255A, Reviewed
Netstor sent over its TurboBox NA255A, an external
enclosure capable of accommodating four dual-slot graphics cards across
one 16-lane PCI Express 3.0 connection. Is this thing fast enough for
general-purpose GPU compute workloads? How about gaming?
Although we're starting to see more mainstream-oriented applications optimized for OpenCL, allowing graphics processors to help speed up performance, general-purpose GPU-accelerated software is still most pointedly aimed at the server and workstation space. Much of that has to do with optimizations for CUDA, which only Nvidia's GPUs support. But OpenCL is gaining traction for video editing, compression, image manipulation, and even bitcoin mining.
When all-out compute power is your priority, connecting multiple GPUs is a great way to push more performance in those apps. Going so far as getting four graphics processors working together in CrossFire or SLI can really help boost the tasks able to utilize them. And we're not talking about gaming, either. We've seen enough examples of three-card scaling tapering off, as four cards don't improve frame rates at all. No, three- and four-way setups are often the domain of power users in need of massive floating-point math.
If you're in the distinguished group of folks able to use four dual-slot graphics cards cooperatively, then you face a handful of configuration issues to overcome. What motherboard do you use? Which case do you pick? Is there a power supply with enough auxiliary connectors for the cards you're using? And how on earth do you plan to keep all of that hardware cool? Even if you get everything set up just the way you want, don't expect to have room left over for any other PCI Express-based upgrade.
But what if you could externalize all of the hardware involved, and maintain a relatively quiet host workstation? What would it take to house all of that hardware? Well, you'd need a large-enough enclosure, able to accommodate eight slots worth of back-panel I/O. You'd need a motherboard with at least four PCI Express x16 slots for the graphics cards. Power delivery would be of the utmost importance, of course. And cooling would be a lynchpin, not only for keeping four stacked boards running stably, but also for keeping the configuration quiet enough to use in the same room.
Meet Netstor's solution, called the TurboBox NA255A. It looks a lot like a mid-tower PC, but its application is far more specific. Designed to function as a PCI Express 3.0-connected expansion box, the NA255A comes with its own 1,000 W power supply and cooling fans. Its sole purpose in life is to add a quartet of PCIe x16 slots (electrically wired to run at x8 transfer rates) and two x4 slots to your list of expansion options. The whole thing is fed by a single 16-lane expansion card that drops into your host workstation, attaching externally.
One third-gen PCI Express x16 slot is able to push up to 16 GB/s per direction. Although there is overhead involved, you still get a ton of bidirectional throughput. Put differently, you can get close to the same transfer rate from one 16-lane PCIe 3.0 slot as four of the eight-lane slots commonly used to create multi-GPU arrays on Sandy Bridge-based platforms.
In theory, that's enough to game on, though this enclosure isn't really designed for the gaming market. Rather, it's intended to serve the folks looking to sling several graphics cards together for massive general-purpose GPU computing. We're interested in both possible use cases, so we're running benchmarks that apply to both segments.
The enclosure itself measures 18" x 14" x 7". It's very solid, finished in brushed aluminum with grating in the front to encourage airflow. Netstor is going for that Apple aesthetic, it appears. And for what it's worth, the TurboBox is compatible with both PCs and Macs. Because modern Mac Pros only give you one second-gen 16-lane slot and two four-lane slots, with a combined output of 300 W, it's hardly a surprise that Netstor is looking to extend compatibility to that platform as well.
One card goes into your host machine and another goes into the TurboBox itself. Cables between the two interface boards facilitate the external connectivity.
But it's what's inside the TurboBox that matters most. You'll find a Surestar TC-1000PL 1,000 W power supply, two 120 mm hot-swappable fans, a 9.5" x 11.7" PCB (NP952A-GPU), and a 6" x 4" PCIe interface card (NP970A).
Although we're starting to see more mainstream-oriented applications optimized for OpenCL, allowing graphics processors to help speed up performance, general-purpose GPU-accelerated software is still most pointedly aimed at the server and workstation space. Much of that has to do with optimizations for CUDA, which only Nvidia's GPUs support. But OpenCL is gaining traction for video editing, compression, image manipulation, and even bitcoin mining.
When all-out compute power is your priority, connecting multiple GPUs is a great way to push more performance in those apps. Going so far as getting four graphics processors working together in CrossFire or SLI can really help boost the tasks able to utilize them. And we're not talking about gaming, either. We've seen enough examples of three-card scaling tapering off, as four cards don't improve frame rates at all. No, three- and four-way setups are often the domain of power users in need of massive floating-point math.
If you're in the distinguished group of folks able to use four dual-slot graphics cards cooperatively, then you face a handful of configuration issues to overcome. What motherboard do you use? Which case do you pick? Is there a power supply with enough auxiliary connectors for the cards you're using? And how on earth do you plan to keep all of that hardware cool? Even if you get everything set up just the way you want, don't expect to have room left over for any other PCI Express-based upgrade.
But what if you could externalize all of the hardware involved, and maintain a relatively quiet host workstation? What would it take to house all of that hardware? Well, you'd need a large-enough enclosure, able to accommodate eight slots worth of back-panel I/O. You'd need a motherboard with at least four PCI Express x16 slots for the graphics cards. Power delivery would be of the utmost importance, of course. And cooling would be a lynchpin, not only for keeping four stacked boards running stably, but also for keeping the configuration quiet enough to use in the same room.
Meet Netstor's solution, called the TurboBox NA255A. It looks a lot like a mid-tower PC, but its application is far more specific. Designed to function as a PCI Express 3.0-connected expansion box, the NA255A comes with its own 1,000 W power supply and cooling fans. Its sole purpose in life is to add a quartet of PCIe x16 slots (electrically wired to run at x8 transfer rates) and two x4 slots to your list of expansion options. The whole thing is fed by a single 16-lane expansion card that drops into your host workstation, attaching externally.
One third-gen PCI Express x16 slot is able to push up to 16 GB/s per direction. Although there is overhead involved, you still get a ton of bidirectional throughput. Put differently, you can get close to the same transfer rate from one 16-lane PCIe 3.0 slot as four of the eight-lane slots commonly used to create multi-GPU arrays on Sandy Bridge-based platforms.
In theory, that's enough to game on, though this enclosure isn't really designed for the gaming market. Rather, it's intended to serve the folks looking to sling several graphics cards together for massive general-purpose GPU computing. We're interested in both possible use cases, so we're running benchmarks that apply to both segments.
The enclosure itself measures 18" x 14" x 7". It's very solid, finished in brushed aluminum with grating in the front to encourage airflow. Netstor is going for that Apple aesthetic, it appears. And for what it's worth, the TurboBox is compatible with both PCs and Macs. Because modern Mac Pros only give you one second-gen 16-lane slot and two four-lane slots, with a combined output of 300 W, it's hardly a surprise that Netstor is looking to extend compatibility to that platform as well.
One card goes into your host machine and another goes into the TurboBox itself. Cables between the two interface boards facilitate the external connectivity.
But it's what's inside the TurboBox that matters most. You'll find a Surestar TC-1000PL 1,000 W power supply, two 120 mm hot-swappable fans, a 9.5" x 11.7" PCB (NP952A-GPU), and a 6" x 4" PCIe interface card (NP970A).
Setup And Overcoming Issues
In theory, populating the NA255A should be as easy as
dropping in graphics cards, connecting their power leads, and hooking
the external enclosure up to the host PC's PCI Express card. The
TurboBox is designed to extend standardized interfaces, so no software
driver should be necessary. In the real world, though, setup isn't quite
that easy.
We encountered a couple of snags along the way. First, I initially didn't realize that the PCIe-based interface cards have specific I/Os. If you look closely, one port on each card is etched with a x16 and the other is etched with a x8. I accidentally hooked the x16 up to the x8 and vice versa. The mistake was easy to reverse, but it doesn't appear to be mentioned anywhere in Netstor's documentation.
The second hang-up was a little more worrisome. Mainly, I couldn't get the TurboBox working at PCI Express 3.0 signaling rates. First- and second-gen PCI Express worked fine. But when the jumped was set to PCIe 3.0, the enclosure stopped recognizing the graphics cards I was plugging in. Netstor helped us work through the issue, which involved reconfiguring switches on the interface cards. This solved our issue.
Our third issue wasn't the TurboBox's fault at all. During our first round of benchmarks, we saw odd performance drops with three Radeon HD 7970s installed. Much troubleshooting revealed that some of our Tahiti-based boards weren't working together the way they should have. It turned out that boards from different vendors shipped with incompatible firmware, which hampered multi-card configurations (even though this should have been fine). Mixing and matching products, even those from the same family, is asking for trouble. Fortunately, we worked around the problem with a different card combo.
Finally, we weren't able to test four Radeon HD 7970s at the same time. Again, this wasn't Netstor's fault, however. The TurboBox is absolutely able to accommodate a quartet of dual-slot boards. But because some of the 7970s in our lab are a little larger, they don't fit into the strict space limitations of two expansion slots. As a result, we're testing with three Radeon HD 7970s. It all works out, though: the ASRock X79 Extreme9 motherboard I'm using only has room for three 7970s anyway, so that's our hard limit for comparing native on-board connectivity to the performance of Netstor's device.
Our test system is built around Intel's X79 Express chipset,
with 8 GT/s transfer rates to each 16-lane PCI Express graphics slot.
We're going to measure the performance of native connectivity and the
TurboBox using one, two, and three GPUs in each solution. This should
tell us whether there's any penalty for externalizing graphics, or for
interfacing with the enclosure over a single third-gen PCI Express x16
slot. Part of our testing also involves comparisons between PCIe 2.0 and
3.0, quantifying the benefits of modern technology versus what came
before.
As mentioned, we're using three Radeon HD 7970s cards for testing, all of which are set to AMD's reference core and memory clock rates.
A number of games should help us flesh out 3D performance, while LuxMark and GUIMiner stand in as OpenCL-accelerated benchmarks. Although we know that the TurboBox isn't a gaming-oriented product, a few tests at 1920x1080 and 5760x1080 should shed some light on its performance potential.
We encountered a couple of snags along the way. First, I initially didn't realize that the PCIe-based interface cards have specific I/Os. If you look closely, one port on each card is etched with a x16 and the other is etched with a x8. I accidentally hooked the x16 up to the x8 and vice versa. The mistake was easy to reverse, but it doesn't appear to be mentioned anywhere in Netstor's documentation.
The second hang-up was a little more worrisome. Mainly, I couldn't get the TurboBox working at PCI Express 3.0 signaling rates. First- and second-gen PCI Express worked fine. But when the jumped was set to PCIe 3.0, the enclosure stopped recognizing the graphics cards I was plugging in. Netstor helped us work through the issue, which involved reconfiguring switches on the interface cards. This solved our issue.
Our third issue wasn't the TurboBox's fault at all. During our first round of benchmarks, we saw odd performance drops with three Radeon HD 7970s installed. Much troubleshooting revealed that some of our Tahiti-based boards weren't working together the way they should have. It turned out that boards from different vendors shipped with incompatible firmware, which hampered multi-card configurations (even though this should have been fine). Mixing and matching products, even those from the same family, is asking for trouble. Fortunately, we worked around the problem with a different card combo.
Finally, we weren't able to test four Radeon HD 7970s at the same time. Again, this wasn't Netstor's fault, however. The TurboBox is absolutely able to accommodate a quartet of dual-slot boards. But because some of the 7970s in our lab are a little larger, they don't fit into the strict space limitations of two expansion slots. As a result, we're testing with three Radeon HD 7970s. It all works out, though: the ASRock X79 Extreme9 motherboard I'm using only has room for three 7970s anyway, so that's our hard limit for comparing native on-board connectivity to the performance of Netstor's device.
Test System And Benchmarks
As mentioned, we're using three Radeon HD 7970s cards for testing, all of which are set to AMD's reference core and memory clock rates.
A number of games should help us flesh out 3D performance, while LuxMark and GUIMiner stand in as OpenCL-accelerated benchmarks. Although we know that the TurboBox isn't a gaming-oriented product, a few tests at 1920x1080 and 5760x1080 should shed some light on its performance potential.
Post a Comment