AXIS Communication

High Performance Point to Point Communication

Designed for SWaP, heterogeneous, and high performance computing. Provided as C API libraries for use in distributed applications.

What is point to point communication?

Getting data from one location to another in a distributed application.


Communication API comparison

Not ideal     Better     Best

Open Standard (i.e. not locked into hardware) YES NO Currently being reviewed by as a candidate to become an open standard.
Suitable for homogeneous systems Partial Partial YES
Supports dynamic dataflow (each communication path is independent and can be explicitly created and destroyed at any time) NO NO YES
Determinism Dynamic memory pinning Extra implicit synchronization Implicit buffering Implicit round trip transfers Extra implicit synchronization Ideal for determinism
Explicit fault tollerance NO Partial YES
One way, zero copy, two sides transfers NO Partial YES
Latency Not ideal due to double copies and round trips Better but still may need round trips Best due to essentially no extra overhead
Function count MPI 1.3 plus partial MPI 2.x: 120+ Other implementations support 200+ 50+ 5
Users guide page count 800+ 150+ 20+
Supported Interconnects RDMA, sockets, mmap, KNEM, GPU Direct/IPC RDMA, sockets, mmap, KNEM, P2P PCIe, memcpy RDMA TCP/UDP, sockets TCP/UDP, mmap, KNEM, GPU Direct/IPC, P2P PCIe, memcpy, ADC/DAC
Unreliable Datagrams NO NO YES
Unreliable Multicast NO NO YES
IO Device/FPGA Integration NO NO YES
Reliable Messages YES YES YES
Inter-processor transfers YES YES YES
Inter-process transfers YES YES YES
Inter-thread transfers NO YES YES
Multiple independent paths between the same two endpoints NO YES YES
Polling (good for latency) YES NO YES
Event Driven (good for SWaP) YES YES YES
Can mix event driven and polling NO NO YES
Collective Functions (barrier, scatter, gather, all2all, reduce, etc) YES YES Yes (provided as open source wrappers and can be modified as needed)

AXIS MPI, Flow, and Takyon code examples

The following show how to send a text message and then a sync message to allow the cycle to start again.


// --------------- Sender ---------------
char message[100];
int num_chars = 1 + sprintf(message, "%s", "Hello World!");
int dest_rank = 1;
int tag = 999;
MPI_Send(message, num_chars, MPI_CHAR, dest_rank, tag, MPI_COMM_WORLD);

// --------------- Receiver ---------------
char message[100];
int max_chars = 100;
int src_rank = 1;
int tag = 999;
MPI_Recv(message, max_chars, MPI_CHAR, src_rank, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Message: %s\n", message);


// --------------- Sender ---------------
int channel = 1;
int buffer = 0;
char *data_addr;
rmp_buffer_grab(channel, buffer, &data_addr, RMP_WAIT_FOREVER, NULL);
int num_bytes = 1 + sprintf(data_addr, "%s", "Hello World!");
int dest_task = 1;
rmp_buffer_send(channel, buffer, NULL, dest_task, num_bytes, RMP_WAIT_FOREVER, NULL);

// --------------- Receiver ---------------
int channel = 1;
int buffer = 0;
int src_task;
int nbytes;
char *data_addr;
rmp_buffer_recv(channel, buffer, &src_task, &nbytes, &data_addr, RMP_WAIT_FOREVER, NULL);
printf("Message: %s\n", data_addr);
rmp_buffer_release(channel, buffer, src_task);

AXIS Takyon

// --------------- Sender ---------------
int buffer = 0;
char *data_addr = path->attrs.sender_addr_list[buffer];
int num_bytes = 1 + sprintf(data_addr, "%s", "Hello World!");
takyonSend(path, buffer, num_bytes, 0, 0, NULL);
takyonRecv(path, buffer, NULL, NULL, NULL);

// --------------- Receiver ---------------
int buffer = 0;
takyonRecv(path, buffer, NULL, NULL, NULL);
char *data_addr = path->attrs.recver_addr_list[buffer];
printf("Message: %s\n", data_addr);
takyonSend(path, buffer, 0, 0, 0, NULL);

Visually design thread/process level dataflow

Via AXISView's ApplicationView.
For AXIS Flow, and coming soon for AXIS Takyon.

Steps to design, build and run your application with distributed dataflow:

  1. Define the threads and their attributes in the distributed application
  2. Define the collective communication groups and their attributes
  3. Define any global resources used by the application
  4. Generate the complete framework source code and Makefiles to create all the processes, threads, and communication paths
  5. For each thread, fill in your custom source code and the appropriate calls to send/recv
  6. Build and run

Visualize live thread/process dataflow in real-time

Via AXISView's RuntimeView.
For AXIS Flow and AXIS MPI, and coming soon for AXIS Takyon.

While the app is running, visually see the results in real-time:

  • How the threads map to the hardware (boards & processors)
  • How the communication paths are mapped across the system
  • Processor, core, & cache usage
  • Communication utilization, per path, and per hardware component (e.g. PCIe switch)

Quickly find any bottlenecks or problems areas so the application can be distributed in a more balanced mapping.

Function call event recording

Via AXIS EventView.
For AXIS Flow and AXIS MPI, and coming soon for AXIS Takyon.

Run the application for a period of time, then flush the events to a file to get some amazing statistics:

  • Nanosecond precision (based on precision of the clock used to get the current wall clock time)
  • Determine if the transfers are getting expected latencies and throughputs
  • Validate determinism across millions of events
  • Identify exact location in source code for problem areas

Debugging performance, determinism, and causality bugs has never been so easy.

Request a Quote

Your information will be used in accordance with our privacy policy.

Contact an Expert

Please enter your location

Get Support

For technical support please visit our support site.

For information on careers at GE please visit our careers page.

Your information will be used in accordance with our privacy policy.