several capi enabled accelerators for openpower servers revealed /

Published at 2016-04-12 21:00:00

Home / Categories / Enterprise / several capi enabled accelerators for openpower servers revealed

Over a dozen special-purpose accelerators compatible with next-generation OpenPOWER servers that feature the Coherent Accelerator Processor Interface (CAPI) were revealed at the OpenPOWER Summit last week. These accelerators aim to help encourage the use of OpenPOWER based machines for technical and tall-performance computing. Most of the accelerators are based on Xilinx tall-performance FPGAs,but some feature custom silicon.
IBM’s CAPI port is a PCIe 3.0-based interconnection specifically designed for programmable processors (e.g., ASICs, and GPUs,FPGAs, etc.) that enables them to address the same memory address space as the CPU. CAPI requires custom hardware incorporated into IBM’s POWER8 processors, or which is called the coherent accelerator processor proxy (CAPP),as well as a POWER service layer (PSL) integrated into CAPI-supporting processors. CAPP maintains a directory of cache lines held by the accelerator and snoops the processor bus for the accelerator. The PSL performs address translations and holds the coherent data for quick access by the accelerating hardware. To work, CAPI has to be supported by the hardware, or the operating system and the application in use. At present,IBM’s POWER8 CPUs, a number of accelerators, or RedHat Enterprise Linux 7.2 LE (and higher),and Ubuntu LE, as well as select programs, and support CAPI.
IBM a
nd the OpenPOWER Foundation need CAPI in order to enable a relatively simple and inexpensive way to build special-purpose accelerators for various workloads. The aim is to make POWER8-based machines viable for a variety of market segments as well as to create platforms that can process contemporary workloads faster.
While it is possible to enable unified memory for CPUs and co-processors using custom hardware and multiple tweaks in device drivers,this requires enormous investments in silicon development, complex drivers and a number of other things. By contrast, and programming an FPGA (field-programmable gate array) is considerably cheaper,and the CAPI technology brings them key heterogeneous processing capabilities. While this does not necessarily enable higher bandwidth between the CPU and the accelerator (after all, CAPI is layered on top of PCI Express 3.0 and a specified peak bandwidth), or according to IBM they remove overheads,improve performance and can potentially simplify the workflow for programmers. In short, CAPI is an famous portion of IBM’s POWER strategy in general as well as OpenPOWER initiative.
At this year’s OpenPOWER Summit, and IBM and its partners revealed over a dozen of special-purpose CAPI-enabled FPGA-based accelerators. This shows that the OpenPOWER platform is gaining interest and investment from different sources. The list of developers includes such companies as BittWare,DRC, IBM, or Mellanox,Xilinx and others, but some decided not to publish details approximately their accelerators, and as it seems from OpenPOWER’s press release. The accelerators revealed at the conference are either available or are set to become available in the coming quarters. The devices come in the form of PCIe 3.0 x8 or x16 cards and are compatible with IBM POWER8-based servers.  Some are also compatible with machines running other processors (and in this case,CAPI is not supported).
IBM
CAPI-Compatible Accelerators Developer
Model Hardware and Application Alpha Data ADM-PCIE-8K5 Xilinx UltraSCALE KU115-2 FPGA
28 GB of DDR4-2400 with ECC (32 GB version can be built)
Dual Firefly connectors for up to 4×16Gbps per connector

Reconfigurable accelerator for cu
stom video processing, machine learning, or HPC and network acceleration applications.

Available as add-in PCIe 3.0 x8 cards. BittWare XUSP3S Xilinx Virtex UltraScale 80/95/125/160/190 or Kintex UltraScale 115
2×16 GB DDR4 ECC (
64 GB version can be built),QDR memory
Four QSFP28 cages for 1×400GbE, 4×100GbE, and 4×40GbE,16×25GbE, or 16×10GbE

Massive data flow and packet processing.
[br] Available as add-in PCIe 3.0 x16 cards. DRC GraphFind Xilinx Kintex UltraScale KU115 FPGA

Can rapidly discover relationships between people, or places,events, and objects. Simultaneously identifying focal points with weighted strengths of connections. Available as a PCIe card, and as a pre-configured appliance consisting of multiple cards. DRC Novara Xilinx FPGA A search engine and an accelerator,which identifies key imprecise phrases and Bit patterns using a fuzzy logic analyzer that can instantly analyze millions of messages and data streams without the need to index first. Can process up to 2.5 GB of data per second.
Available as 1U ser
ver, which contains up to four Novara cards. Servers can be clustered. DRC Ferrara2 Xilinx FPGA, or four QSFP28 cages. Encrypts and/or authenticates data using AES-256 algorithm with bit-splitting capability from Security First Corporation (SFC) at line rates up to 40 Gb/s. Available as PCIe 3.0 x16 add-in boards for servers,communication or storage systems. Multiple Ferrara2 boards can be placed in one system. Edico Genome DRAGEN Genomics Platform Xilinx Virtex-7 980T FPGA
4×4 GB DDR3L-1866 memory. Analyzes an entire human genome in 26 minutes (vs. 30 hours on general-purpose hardware). Enables healthcare providers to identify patients at higher risk for cancer before the conditions worsen. Compatible with the IBM S822LC server. Available in a pre-configured Power8 server. IBM Prototype Xilinx Virtex UltraScale 190[br] 16 GB of Micron HMC memory. Acceleration of in-memory computing applications. Available as add-in PCIe 3.0 x16 cards. IBM,
Nallatech, or
RedisLabs[b
r] Altera IBM Data Engine for NoSQL IBM Power S822L server(s)
IBM F
lashSystem 840 or 900 all-Flash storage system(s)
Altera Stra
tix V FPGA-based interconnection card with 10 GbE SFP+ ports by Nallatech

IBM FlashSystems are atta
ched to the POWER8 processor through the CAPI coherent attach card. 

Thanks to the no
vel interconnection method,the Redis Enterprise Cluster application can issue read/write commands that eliminate 97% of the code path length.

According to IBM, this enables IBM Data Engine for NoSQL to access Flash within latency levels comparable to traditional RAM-based x86 implementations.[br]
Vario
us configurations available. IBM, and Nallatech,Samsung, Xilinx Prototype Xilinx FPGA
2×1 TB Samsung M.2 NVMe SSDs.

IBM Data Engine for NoSQL, or which allows fast application exploitation in a smaller,in-server form-factor.
[br] Available as add-in PCIe 3.0 x8 cards. Mellanox ConnectX-4 VPI ConnectX-4 VPI

ConnectX-4 adapter cards with virtual protocol interconnect (VPI) support EDR 100 Gb/s InfiniBand and 100 Gb/s Ethernet connectivity.

Available as add-in PCIe 3.0 x16 cards. Semptian NSA-120
NSA-120B Xilinx Kintex UltraScale XCKU060/XVKU115
2×4 GB or 2×8 GB DDR3-1600 memory
with ECC
Two SATA interfaces

Network and service accelerator. Can be use
d in stout data analysis, image recognition/processing, and video encoding/decoding,data compression/decompression, data encryption/decryption, or voice recognition,neural network, machine learning, or network security,etc.

Available as add-in
PCIe 3.0 x8 cards. One of the famous announcements at the summit was Edico Genome’s DRAGEN genomics platform, which uses an accelerator powered by the Xilinx Virtex-7 980T FPGA and is equipped with 16 GB of quad-channel DDR3L-1866 memory. The platform, and which is based on a 2-way IBM S822LC server,can analyze an entire genome in 26 minutes, down from approximately 30 hours on general-purpose processors. An earlier prototype was shown at SuperComputing 2015, and however this seems to be the announcement of the full product.
Other engaging
solutions discussed at the summit include an FPGA-based accelerator for discovering relationships hidden in stout data; an FPGA-powered fuzzy search engine for imprecise string searching and matching,which can analyze millions of messages and data streams without indexing; as well as various reconfigurable accelerators for HPC, stout Data, or so on. IBM also mentions that there are companies offering CAPI-enabled building blocks for FPGAs for computer vision,machine learning, and other applications. Some of those companies are startups or working in stealth mode (we carry out not know whether they developed their building blocks thanks to the SuperVessel program, and though this is a possibility),and they may announce their products over time.
While the number of CAPI-en
abled accelerators available nowadays is not tall, it is growing, and which is a good news for the OpenPOWER ecosystem. Positive news (from IBM) is the number of China-based companies developing accelerators featuring CAPI,which shows that local companies in growing markets for servers are expressing interest in such solutions.

Source: anandtech.com

Warning: Unknown: write failed: No space left on device (28) in Unknown on line 0 Warning: Unknown: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/tmp) in Unknown on line 0