Blob Blame History Raw
#
## Copyright (C) Mellanox Technologies Ltd. 2001-2020.  ALL RIGHTS RESERVED.
## Copyright (C) UT-Battelle, LLC. 2014-2019. ALL RIGHTS RESERVED.
## Copyright (C) ARM Ltd. 2017-2020.  ALL RIGHTS RESERVED.
##
## See file LICENSE for terms.
##
#

## 1.8.0 (April 3, 2020)
### Features:
#### UCX Core
- Improved detection for DEVX support
- Improved TCP scalability 
- Added support for ROCM to perftest
- Added support for different source and target memory types to perftest
- Added optimized memcpy for ROCM devices
- Added hardware tag-matching for CUDA buffers
- Added support for CUDA and ROCM managed memories
- Added support for client/server disconnect protocol over rdma connection manager
- Added support for striding receive queue for hardware tag-matching
- Added XPMEM-based rendezvous protocol for shared memory
- Added support shared memory communication between containers on same machine
- Added support for multi-threaded RDMA memory registration for large regions
- Added new test cases to Azure CI

#### UCX Java (API Preview)
- Added APIs for stream send/recv, tag probe, and connect request handle
- Added Java package (automatically published) to Maven central

### Bugfixes:
- Multiple fixes in JUCX
- Fixes in UCP thread safety
- Fixes for most recent versions GCC, PGI, and ICC
- Fixes for CPU affinity on Azure instances
- Fixes in XPMEM support on PPC64
- Performance fixes in CUDA IPC
- Fixes in RDMA CM flows 
- Multiple fixes in TCP transport
- Multiple fixes in documentation
- Fixes in transport lane selection logic
- Fixes in Java jar build
- Fixes in socket connection manager for Nvidia DGX-2 platform

## 1.7.0 (January 19, 2020)
### Features:
- Added support for multiple listening transports
- Added UCT socket-based connection manager transport
- Updated API for UCT component management
- Added API to retrieve the listening port
- Added UCP active message API
- Removed deprecated API for querying UCT memory domains
- Refactored server/client examples
- Added support for dlopen interception in UCM
- Added support for PCIe atomics
- Updated Java API: added support for most of UCP layer operations
- Updated support for Mellanox DevX API
- Added multiple UCT/TCP transport performance optimizations
- Optimized memcpy() for Intel platforms
- Added protection from non-UCX socket based app connections
- Improved search time for PKEY object
- Enable gtest over IPv6 interfaces
- Updated Mellanox and Bull device IDs
- Added support for CUDA_VISIBLE_DEVICES
- Increased limits for CUDA IPC registration

### Bugfixes:
- Multiple fixes in UCP, UCT, UCM libraries
- Multiple fixes for BSD and Mac OS systems
- Fixes for Clang compiler
- Fixes for CUDA IPC
- Fix CPU optimization configuration options
- Fix JUCX build on GPU nodes
- Fix in Azure release pipeline flow
- Fix in CUDA memory hooks management
- Fix in GPU memory peer direct gtest
- Fix in TCP connection establishment flow
- Fix in GPU IPC check
- Fix in CUDA Jenkins test flow
- Multiple fixes in CUDA IPC flow
- Fix adding missing header files
- Fix to prevent failures in presence of VPN enabled Ethernet interfaces

## 1.6.1 (September 23, 2019)
### Features:
- Added Bull Atos HCA device IDs
- Added Azure Pipelines testing

### Bugfixes:
- Multiple static checker fixes
- Remove pkg.m4 dependency
- Multiple clang static checker fixes
- Fix mem type support with generic datatype

## 1.6.0 (July 17, 2019)
### Features:
- Modular architecture for UCT transports
- ROCm transport re-design: support for managed memory, direct copy, ROCm GDR
- Random scheduling policy for DC transport
- Optimized out-of-box settings for multi-rail
- Added support for OmniPath (using Verbs)
- Support for PCI atomics with IB transports
- Reduced UCP address size for homogeneous environments

### Bugfixes:
- Multiple stability and performance improvements in TCP transport
- Multiple stability fixes in Verbs and MLX5 transports
- Multiple stability fixes in UCM memory hooks
- Multiple stability fixes in UGNI transport
- RPM Spec file cleanup
- Fixing compilation issues with most recent clang and gcc compilers
- Fixing the wrong name of aliases
- Fix data race in UCP wireup
- Fix segfault when libuct.so is reloaded - issue #3558
- Include Java sources in distribution
- Handle EADDRNOTAVAIL in rdma_cm connection manager
- Disable ibcm on RHEL7+ by default
- Fix data race in UCP proxy endpoint
- Static checker fixes
- Fallback to ibv_create_cq() if ibv_create_cq_ex() returns ENOSYS
- Fix malloc hooks test
- Fix checking return status in ucp_client_server example
- Fix gdrcopy libdir config value
- Fix printing atomic capabilities in ucx_info
- Fix perftest warmup iterations to be non-zero
- Fixing default values for configure logic
- Fix race condition updating fired_events from multiple threads
- Fix madvise() hook

### Tested configurations:
- RDMA: MLNX_OFED 4.5, distribution inbox drivers, rdma-core 22.1
- CUDA: gdrcopy 1.3.2, cuda 9.2, ROCm 2.2
- XPMEM: 2.6.2
- KNEM: 1.1.3

## 1.5.1 (April 1, 2019)
### Bugfixes:
- Fix dc_mlx5 transport support check for inbox libmlx5 drivers - issue #3301
- Fix compilation warnings with gcc9 and clang
- ROCm - reduce log level of device-not-found message

## 1.5.0 (February 14, 2019)
### Features:
- New emulation mode enabling full UCX functionality (Atomic, Put, Get)
  over TCP and RDMA-CORE interconnects that don't implement full RDMA semantics
- Non-blocking API for all one-sided operations. All blocking communication APIs marked
  as deprecated
- New client/server connection establishment API, which allows connected handover between workers
- Support for rdma-core direct-verbs (DEVX) and DC with mlx5 transports
- GPU - Support for stream API and receive side pipelining
- Malloc hooks using binary instrumentation instead of symbol override
- Statistics for UCT tag API
- GPU-to-Infiniband HCA affinity support based on locality/distance (PCIe)

### Bugfixes:
- Fix overflow in RC/DC flush operations
- Update description in SPEC file and README
- Fix RoCE source port for dc_mlx5 flow control
- Improve ucx_info help message
- Fix segfault in UCP, due to int truncation in count_one_bits()
- Multiple other bugfixes (full list on github)

### Tested configurations:
- InfiniBand: MLNX_OFED 4.4-4.5, distribution inbox drivers, rdma-core
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2

## 1.4.0-rc2 (October 23, 2018)
### Features:
- Improved support for installation with latest ROCm
- Improved support for latest rdma-core
- Added support for CUDA IPC for intra-node GPU
- Added support for CUDA memory allocation cache for mem-type detection
- Added support for latest Mellanox devices
- Added support for Nvidia GPU managed memory
- Added support for multiple connections between the same pair of workers
- Added support large worker address for client/server connection establishment
  and INADDR_ANY
- Added support for bitwise atomics operations

### Bugfixes:
- Performance fixes for rendezvous protocol
- Memory hook fixes
- Clang support fixes
- Self tl multi-rail fix
- Thread safety fixes in IB/RDMA transport
- Compilation fixes with upstream rdma-core
- Multiple minor bugfixes (full list on github)
- Segfault fix for a code generated by armclang compiler
- UCP memory-domain index fix for zero-copy active messages

### Tested configurations:
- InfiniBand: MLNX_OFED 4.2-4.4, distribution inbox drivers, rdma-core
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2
- Multiple bugfixes (full list on github)

### Known issues:
#2919 - Segfault in CUDA support when KNEM not present and CMA is active
intra-node RMA transport. As a workaround user can disable CMA support at
compile time: --disable-cma. Alternatively user can remove CMA from UCX_TLS
list, for example: UCX_TLS=mm,rc,cuda_copy,cuda_ipc,gdr_copy.

## 1.3.1 (August 20, 2018)
### Bugfixes:
- Prevent potential out-of-order sending in shared memory active messages
- CUDA: Include cudamem.h in source tarball, pass cudaFree memory size
- Registration cache: fix large range lookup, handle shmat(REMAP)/mmap(FIXED)
- Limit IB CQE size for specific ARM boards
- RPM: explicitly set gcc-c++ as requirement
- Multiple bugfixes (full list on github)

### Tested configurations:
- InfiniBand: MLNX_OFED 4.2, inbox OFED drivers.
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2

## 1.3.0 (February 15, 2018)
### Features:
- Added stream-based communication API to UCP
- Added support for GPU platforms: Nvidia CUDA and AMD ROCm software stacks
- Added API for client/server based connection establishment
- Added support for TCP transport
- Support for InfiniBand tag-matching offload for DC and accelerated transports
- Multi-rail support for eager and rendezvous protocols
- Added support for tag-matching communications with CUDA buffers
- Added ucp_rkey_ptr() to obtain pointer for shared memory region
- Avoid progress overhead on unused transports
- Improved scalability of software tag-matching by using a hash table
- Added transparent huge-pages allocator
- Added non-blocking flush and disconnect for UCP
- Support fixed-address memory allocation via ucp_mem_map()
- Added ucp_tag_send_nbr() API to avoid send request allocation
- Support global addressing in all IB transports
- Add support for external epoll fd and edge-triggered events
- Added registration cache for knem
- Initial support for Java bindings

### Bugfixes:
- Multiple bugfixes (full list on github)

### Tested configurations:
- InfiniBand: MLNX_OFED 4.2, inbox OFED drivers.
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2

### Known issues:
#2047 - UCP: ucp_do_am_bcopy_multi drops data on UCS_ERROR_NO_RESOURCE
#2047 - failure in ud/uct_flush_test.am_zcopy_flush_ep_nb/1
#1977 - failure in shm/test_ucp_rma.blocking_small/0
#1926 - Timeout in mpi_test_suite with HW TM
#1920 - transport retry count exceeded in many-to-one tests
#1689 - Segmentation fault on memory hooks test in jenkins

## 1.2.2 (January 4, 2018)
### Main:
- Support including UCX API headers from C++ code
- UD transport to handle unicast flood on RoCE fabric
- Compilation fixes for gcc 7.1.1, clang 3.6, clang 5

### Details:
- When UD transport is used with RoCE, packets intended for other peers may
  arrive on different adapters (as a result of unicast flooding).
- This change adds packet filtering based on destination GIDs. Now the packet
  is silently dropped, if its destination GID does not match the local GID.
- Added a new device ID for InfiniBand HCA
- [packaging] Move `examples/` and `perftest/` into doc
- [packaging] Update spec to work on old distros while complaint with Fedora
  guidelines
- [cleanup] Removed unused ptmalloc version (2.83)
- [cleanup] Fixup license headers

## 1.2.1 (August 28, 2017)
### Bugfixes:
- Compilation fixes for gcc 7.1
- Spec file cleanups
- Versioning cleanups

## 1.2.0 (June 15, 2017)
### Supported platforms
- Shared memory: KNEM, CMA, XPMEM, SYSV, Posix
- VERBs over InfiniBand and RoCE.
  VERBS over other RDMA interconnects (iWarp, OmniPath, etc.) is available
  for community evaluation and has not been tested in context of this release
- Cray Gemini and Aries
- Architectures: x86_64, ARMv8 (64bit), Power64

### Features:
- Added support for InfiniBand DC and UD transports, including accelerated verbs for Mellanox devices
- Full support for PGAS/SHMEM interfaces, blocking and non-blocking APIs
- Support for MPI tag matching, both in software and offload mode
- Zero copy protocols and rendezvous, registration cache
- Handling transport errors
- Flow control for DC/RC
- Dataypes support: contiguous, IOV, generic
- Multi-threading support
- Support for ARMv8 64bit architecture
- A new API for efficient memory polling
- Support for malloc-hooks and memory registration caching

### Bugfixes:
 - Multiple bugfixes improving overall stability of the library

### Known issues:
#1604 - Failure in ud/test_ud_slow_timer.retransmit1/1 with valgrind bug
#1588 - Fix reading cpuinfo timebase for ppc bug portability training
#1579 - Ud/test_ud.ca_md test takes too long too complete bug
#1576 - Failure in ud/test_ud_slow_timer.retransmit1/0 with valgrind bug
#1569 - Send completion with error with dc_verbs bug
#1566 - Segfault in malloc_hook.fork on arm bug
#1565 - Hang in udrc/test_ucp_rma.nonblocking_stream_get_nbi_flush_worker bug
#1534 - Wireup.c:473 Fatal: endpoint reconfiguration not supported yet bug
#1533 - Stack overflow under Valgrind 'rc_mlx5/uct_p2p_err_test.local_access_error/0' bug
#1513 - Hang in MPI_Finalize with UCX_TLS=rc[_x],sm on the bsend2 test bug
#1504 - Failure in cm/uct_p2p_am_test.am_bcopy/1 bug
#1492 - Hang when using polling fd bug
#1489 - Hang on the osu_fop_latency test with RoCE bug
#1005 - ROcE problem with OMPI direct modex - UD assertion

## 1.1.0 (September 1, 2015)
### Workarounds:
### Features:
- Added support for AM based on FIFO in `mm` shared memory transport
- Added support for UCT `knem` shared memory transport (http://knem.gforge.inria.fr)
- Added support for UCT `mm/xpmem` shared memory transport (https://github.com/hjelmn/xpmem)

## 1.0.0 (July 22, 2015)
### Features:
- Added support for UCT `cma` shared memory transport (Cross-Memory Attatch)
- Added support for UCT `mm` shared memory transport with mmap/sysv APIs
- Added support for UCT `rc` transport based on Infiniband/RC with verbs
- Added support for UCT `mlx5_rc` transport based on Infiniband/RC with accelerated verbs
- Added support for UCT `cm` transport based on Infiniband/SIDR (Service ID Resolution)
- Added support for UCT `ugni` transport based on Cray/UGNI
- Added support for Doxygen based documentation generation
- Added support for UCP basic protocol layer to fit PGAS paradigm (RMA, AMO)
- Added ucx_perftest utility to exercise major UCX flows and provide performance metrics
- Added test script for jenkins (contrib/test_jenkins.sh)
- Added packaging for RPM/DEB based linux distributions (see contrib/buildrpm.sh)
- Added Unit-tests infractucture for UCX functionality based on Google Test framework (see test/gtest/)
- Added initial integration for OpenMPI with UCX for PGAS/SHMEM API
  (see: https://github.com/openucx/ompi-mirror/pull/1)
- Added end-to-end testing infrastructure based on MTT (see contrib/mtt/README_MTT)