High Performance Point to Point Communication
Designed for SWaP, heterogeneous, and high performance computing. Provided as C API libraries for use in distributed applications.
What is point to point communication?
Getting data from one location to another in a distributed application.
Communication API comparison
Not ideal Better Best
FEATURES | AXIS MPI | AXIS Flow | AXIS Takyon |
---|---|---|---|
Open Standard (i.e. not locked into hardware) | YES | NO | Currently being reviewed by khronos.org as a candidate to become an open standard. |
Suitable for homogeneous systems | Partial | Partial | YES |
Supports dynamic dataflow (each communication path is independent and can be explicitly created and destroyed at any time) | NO | NO | YES |
Determinism | Dynamic memory pinning Extra implicit synchronization Implicit buffering Implicit round trip transfers | Extra implicit synchronization | Ideal for determinism |
Explicit fault tollerance | NO | Partial | YES |
One way, zero copy, two sides transfers | NO | Partial | YES |
Latency | Not ideal due to double copies and round trips | Better but still may need round trips | Best due to essentially no extra overhead |
Function count | MPI 1.3 plus partial MPI 2.x: 120+ Other implementations support 200+ | 50+ | 5 |
Users guide page count | 800+ | 150+ | 20+ |
Supported Interconnects | RDMA, sockets, mmap, KNEM, GPU Direct/IPC | RDMA, sockets, mmap, KNEM, P2P PCIe, memcpy | RDMA TCP/UDP, sockets TCP/UDP, mmap, KNEM, GPU Direct/IPC, P2P PCIe, memcpy, ADC/DAC |
Unreliable Datagrams | NO | NO | YES |
Unreliable Multicast | NO | NO | YES |
IO Device/FPGA Integration | NO | NO | YES |
GPU Support | YES | NO | YES |
Reliable Messages | YES | YES | YES |
Inter-processor transfers | YES | YES | YES |
Inter-process transfers | YES | YES | YES |
Inter-thread transfers | NO | YES | YES |
Multiple independent paths between the same two endpoints | NO | YES | YES |
Polling (good for latency) | YES | NO | YES |
Event Driven (good for SWaP) | YES | YES | YES |
Can mix event driven and polling | NO | NO | YES |
Collective Functions (barrier, scatter, gather, all2all, reduce, etc) | YES | YES | Yes (provided as open source wrappers and can be modified as needed) |
AXIS MPI, Flow, and Takyon code examples
The following show how to send a text message and then a sync message to allow the cycle to start again.
AXIS MPI
// --------------- Sender ---------------char message[100];
int num_chars = 1 + sprintf(message, "%s", "Hello World!");
int dest_rank = 1;
int tag = 999;
MPI_Send(message, num_chars, MPI_CHAR, dest_rank, tag, MPI_COMM_WORLD);
// --------------- Receiver ---------------
char message[100];
int max_chars = 100;
int src_rank = 1;
int tag = 999;
MPI_Recv(message, max_chars, MPI_CHAR, src_rank, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Message: %s\n", message);
AXIS Flow
// --------------- Sender ---------------int channel = 1;
int buffer = 0;
char *data_addr;
rmp_buffer_grab(channel, buffer, &data_addr, RMP_WAIT_FOREVER, NULL);
int num_bytes = 1 + sprintf(data_addr, "%s", "Hello World!");
int dest_task = 1;
rmp_buffer_send(channel, buffer, NULL, dest_task, num_bytes, RMP_WAIT_FOREVER, NULL);
// --------------- Receiver ---------------
int channel = 1;
int buffer = 0;
int src_task;
int nbytes;
char *data_addr;
rmp_buffer_recv(channel, buffer, &src_task, &nbytes, &data_addr, RMP_WAIT_FOREVER, NULL);
printf("Message: %s\n", data_addr);
rmp_buffer_release(channel, buffer, src_task);
AXIS Takyon
// --------------- Sender ---------------int buffer = 0;
char *data_addr = path->attrs.sender_addr_list[buffer];
int num_bytes = 1 + sprintf(data_addr, "%s", "Hello World!");
takyonSend(path, buffer, num_bytes, 0, 0, NULL);
takyonRecv(path, buffer, NULL, NULL, NULL);
// --------------- Receiver ---------------
int buffer = 0;
takyonRecv(path, buffer, NULL, NULL, NULL);
char *data_addr = path->attrs.recver_addr_list[buffer];
printf("Message: %s\n", data_addr);
takyonSend(path, buffer, 0, 0, 0, NULL);
Visually design thread/process level dataflow
Via AXISView's ApplicationView.
For AXIS Flow, and coming soon for AXIS Takyon.
Steps to design, build and run your application with distributed dataflow:
- Define the threads and their attributes in the distributed application
- Define the collective communication groups and their attributes
- Define any global resources used by the application
- Generate the complete framework source code and Makefiles to create all the processes, threads, and communication paths
- For each thread, fill in your custom source code and the appropriate calls to send/recv
- Build and run
Visualize live thread/process dataflow in real-time
Via AXISView's RuntimeView.
For AXIS Flow and AXIS MPI, and coming soon for AXIS Takyon.
While the app is running, visually see the results in real-time:
- How the threads map to the hardware (boards & processors)
- How the communication paths are mapped across the system
- Processor, core, & cache usage
- Communication utilization, per path, and per hardware component (e.g. PCIe switch)
Quickly find any bottlenecks or problems areas so the application can be distributed in a more balanced mapping.
Function call event recording
Via AXIS EventView.
For AXIS Flow and AXIS MPI, and coming soon for AXIS Takyon.
Run the application for a period of time, then flush the events to a file to get some amazing statistics:
- Nanosecond precision (based on precision of the clock used to get the current wall clock time)
- Determine if the transfers are getting expected latencies and throughputs
- Validate determinism across millions of events
- Identify exact location in source code for problem areas
Debugging performance, determinism, and causality bugs has never been so easy.