GASNet ChangeLog 
----------------
$Revision: 1.85.2.11 $

----------------------------------------------------------------------
10-21-2013 : Release 1.22.0

* Cray XE, XK and XC series (gemini- and aries-conduits)
  - With this release support for the Aries interconnect of the Cray
    XC-30 system has graduated out of BETA status.
  - This release includes a new implementation of Active Messages
    + Memory use scales better (18% less per peer by default)
    + Larger default MaxMedium yields higher peak AM Medium bandwidth
    + MaxMedium may now be changed at configure time
  - This release features a new default barrier: GNIDISSEM
  - Contention among pthreads in a PAR build has been greatly reduced.
  - Optional (experimental) "multi-domain" support to almost entirely
    eliminate contention among pthreads in PAR builds (see the conduit
    README for details and instructions to enable and use this feature).
  - Fix bug 3078 in which use of addresses in PSHM-imported GASNet
    segments as local address in Extended API calls would crash.

* IBM BG/Q (pami-conduit)
  - Support for intra-node shared memory communication (PSHM) is now
    available on BlueGene/Q.  Please see docs/pshm-design.txt for
    details of the configuration and environment setup required.
  - Updates to support changes made in the V1R2M1 driver.

* InfiniBand (ibv-conduit for OpenFabrics Verbs)
  - Now support direct PMI-based launch (e.g. srun or hydra)
  - Significantly simpler code in the critical paths for Puts and Gets
  - This release conduit features a new default barrier: IBDISSEM
  - Properly support systems with pagesize larger than 4KB.

* Mellanox ConnectX series HCAs (mxm-conduit)
  - Now support direct PMI-based launch (e.g. srun or hydra)
  - This release adds support for v2.x of the MXM API.

* Mellanox Fabric Collective Accelerator (FCA)
  - While FCA acceleration of collectives is still only available in
    a SEQ (non-pthreaded) build of GASNet, it can now be enabled at
    configure time without disabling compilation of PAR and PARSYNC
    libraries.

* Portals 4.x API (portals4-conduit)
  - The implementation now includes a native implementation of the
    Extended API (Put and Get) in terms of the Portals4 API.

* Portable UDP support (udp-conduit)
  - Support MASTERIP and WORKERIP settings to deal with a much wider
    range of network configurations.  See the counduit README.

* Platform support/portability
  - Initial support for building GASNet with NVIDIA's nvcc compiler
  - Initial support for running GASNet on Intel MIC (a.k.a. Xeon Phi);
    This release only supports running GASNet on MIC in native mode.
  - Fix Bug 3167 - incorrect memory fences with Cray C compiler .

* PSHM (intra-node shared memory)
  - Fix bug 3181 in which unequal segment requests could lead to either
    failure at attach time, or incomplete sharing of memory.

* General
  - New GASNET_NO_CATCH_SIGNAL environment variable to suppress the
    default signal handling behavior when debugging.
  - Initial prototype implementations of several features slated for
    standardization in a future GASNet specification:
    + Variable argument AM Request and Reply functions
        gasnet_AMRequest{Short,Medium,Long,LongAsync}()
        gasnet_AMReply{Short,Medium,Long}()
      Work just like the fixed-argument versions, but take a "numargs"
      integer argument before the first handler argment (or as the last
      argument when numargs == 0).
    + Unnamed split-phase barrier which can leverage hardware support
      not possible when implementing full (UPC-centric) name matching.
        GASNET_BARRIERFLAG_UNNAMED as flags argument
    + Single-phase barrier (both with and w/o name matching) also able
      to leverage additional hardware support.
        gasnet_barrier(id, flags)
    + Barrier matching results query (for building of hierarchical
      implementation which require UPC-style name matching)
        int is_anonymous = gasnet_barrier_result(&value_if_not_anon);
    + The number of implicit-handle (nbi-suffixed) operations outstanding
      is now unbounded, including when using an nbi access region.
	
* Removal of unmaintained network conduits
  - This release no longer supports the following networks:
    + elan (Quadrics elan3/elan4)
    + gm (Myrinet GM)
    + vapi (legacy Mellanox-specific InfiniBand)
    + lapi (IBM LAPI)
    + portals (Cray Portals for XT series)
    + sci (Dolphin SCI)
    If you require GASNet on one these networks, and can provide
    access to resources for maintenance of the code, please contact us.
	
----------------------------------------------------------------------
04-30-2013 : Release 1.20.2

* Cray XC30 (aries-conduit)
  - This release includes Beta support for the Aries interconnect of
    the Cray XC30 (aka Cascade) system.  This initial implementation
    is believed to be fully correct, but has yet to be fully tuned.

* Cray XE & XK series (gemini-conduit)
  - With this release support for the Gemini interconnect of the Cray
    XE and XK series systems has graduated out of BETA status.
  - Significantly improved performance via uGNI's "RELAXED_PI_ORDERING",
    increased overlap in non-blocking operations, and a lower latency
    mechanism for small Puts.

* IBM PAMI (pami-conduit)
  - Updated to support BlueGene/Q driver V1R2M0
  - BlueGene/Q users are advised to use V1R2M0 + efix 23 (or newer)
    if using GASNET_PAR mode.  Prior to efix 23, memset() in the BG/Q
    C runtime was not thread safe, and this was responsible for the
    "unexplained failures" noted in GASNet's 1.18.2 release notes.

* NEW Beta support for Portals 4.x API (portals4-conduit)
  - This release includes BETA support for the Portals 4.x API as
    implemented at https://code.google.com/p/portals4/
  - This initial implementation does not yet include "native" Put
    or Get support, using the AM-based reference implementation instead.

* InfiniBand (ibv- and vapi-conduits)
  - Can control the IB MTU size via environment variable GASNET_MAX_MTU.
  - Automatically ignore unsupported iWARP adapters.

* PSHM (intra-node shared memory)
  - Add the GASNET_SUPERNODE_MAXSIZE environment variable to control the
    grouping of cores on a compute node into multiple GASNet "supernodes".

* Platform support/portability
  - Added support for THUMB2 mode of recent ARM processors
  - Work around a bug in Clang++ building udp-conduit (GASNet bug 3129)

----------------------------------------------------------------------
10-30-2012 : Release 1.20.0

* IBM PAMI (pami-conduit)
  - With this release PAMI conduit has graduated out of BETA status.
  - This release now implements GASNet's collectives via the PAMI
    collectives, yielding much-improved performance in many cases.
  - A faster default barrier implementation (PAMIDISSEM).

* Cray XE & XK series (gemini-conduit)
  - Improved performance for 129 to 4096 byte transfers.
  - This release includes *experimental* support (OFF by default) for
    improved performance via uGNI's "RELAXED_PI_ORDERING", which can be
    enabled using an environment variable.
    See the conduit README for more information.

* Mellanox ConnectX series HCAs (mxm-conduit)
  - This is the first official release of GASNet support for the "MXM" API
    for recent Mellanox's InfiniBand HCAs.  This is based on the code which
    Mellanox has been distributing for about one year.

* Mellanox Fabric Collective Accelerator (FCA)
  - Optional collectives acceleration using Mellanox's FCA which works with
    both ibv-conduit and mxm-conduit on recent Mellanox HCAs.
  - See other/fca/README-fca.txt for details

* PSHM (intra-node shared memory)
  - Active Messages over shared-memory have been reimplemented using
    Nemesis lock-free queues, yielding improved performance and lowering
    the memory required from quadratic to linear in cores-per-node.
  - Intra-node barrier has been reimplemented for higher performance.
  - Support for up to about 45K processes/node (vs. default of 255) is
    now available as a configure option: --enable-large-pshm.  This has
    been tested to 4096 proc/node.  However, memory and other resources,
    such as file descriptors, will typically impose a much lower limit.

* General
  - A new barrier implementation (an RDMA-based dissemination algorithm)
    replaces the previous default (AM-based dissemination) for most
    network conduits.
  - Barrier matching rules in "corner-cases" have been revised to match
    the semantics expected to appear in the UPC 1.3 specification.  The
    changes legalize some calling cases which were previously erroneous.
    No cases which were legal before have become illegal.
  - SLURM integration in ssh-spawner.

----------------------------------------------------------------------
05-14-2012 : Release 1.18.2

* IBM PAMI (pami-conduit)
  - This release includes a BETA of native support for the IBM PAMI
    (Parallel Active Message Interface) network API, found on the IBM BG/Q
    and on systems running IBM's Parallel Environment (IBM PE) software
    for Linux (both x86-64 and PowerPC architectures).
  - Testing on an IBM Power 775 system (a.k.a. PERCS or Blue Waters) with
    its HFI network passed all GASNet and Berkeley UPC tests.
  - Testing on an IBM BlueGene/Q shows there are still some unexplained
    failures at the time of this release.
  - This code has not yet been tested on IBM PE clusters running PAMI over
    InfiniBand or Ethernet.  Reports of such testing would be appreciated.
    Access to such systems for testing would be GREATLY appreciated.
  - The performance of pami-conduit has not been fully tuned, however we
    believe that the implementation is correct (with only minor bugs
    remaining on BlueGene/Q).
  - This is still only a BETA-quality implementation, and performance
    improvements are anticipated in future releases.

* Cray XE & XK series (gemini-conduit)
  - Added Cray XK series as a supported/tested platform.
  - Reduced by more than 60% the memory used for receiving AMs (bug 3067).
  - Made several code cleanups on the road to production-quality.
  - This conduit is still "BETA" due to known room for improvements.

* InfiniBand (ibv- and vapi-conduits)
  - Improved scalability (time and memory) of startup/shutdown code
    + Switch to native InfiniBand communication earlier in startup.
    + Use more scalable communication pattern for exit coordination.
    + Made communications performance improvements to sockets code in the
      ssh-spawner used for startup when spawning via MPI is unavailable
      or has been disabled.
  - Made improvements to dynamic connection support
    + The dynamic connection code has been more robust in the face of lost
      packets through use of TCP-inspired adaptive timeouts.
    + The dynamic connection code has been made insensitive to the problem
      of inattentive peers (ones not making frequent enough calls to GASNet)
      through the use of an internal thread which wakes up only on arrival
      of network traffic related to dynamic connection setup.
  - Made improvements in the Active Message progress thread
    + The AM progress thread is finally available in ibv-conduit.
    + The maximum wake-up rate of the progress thread can now be limited.
    + See the conduit README file for documentation on GASNET_RCV_THREAD
      and GASNET_RCV_THREAD_RATE environment variables for more info.

* PSHM (intra-node shared memory)
  - Extended PSHM-via-XPMEM support to SGI Altix series, where it is
    available at configure time, but not currently used by default.
  - Support for PSHM over SystemV shared memory no longer requires a
    working implementation of mmap() (bug 3066).
    + This is the first PSHM support for Cygwin.
  - Expanded PSHM docs, including notes for configuration on Cygwin.
    See docs/pshm-design.txt in the GASNet sources.

* Platform support/portability
  - Ported network-independent code to the IBM BlueGene/Q.
    This provides fully-functioning GASNet-Tools and smp- and mpi-conduits.
  - Implemented native 64-bit atomics for ILP32 builds with Apple XCode-4.x,
    resolving bug 3071 which required disabling them in the 1.18.0 release.
  - Force PGI compilers on MacOS to honor the documented ABI (bug 2150).
  - Made several changes to ssh-spawner sockets code for better portability.
  - Disabled ASLR (address-space layout randomization) in MacOS Lion (and
    newer) via the GASNET_LDFLAGS.
  - No longer trust Cygwin's gethostid(), which has known problem.
  - Added initial support for Clang (bug 3075).
    + Not yet listed in README as officially supported for any target
    + x86 and x86-64 targets have been well tested on Linux and *BSD
    + x86 and x86-64 targets have been lightly tested on MacOS Lion
    + ppc64 target has been lightly tested on BG/Q platforms
  - Made fixes for __attribute__ configure probes
    + The __always_inline__ probe failed incorrectly with gcc-4.7.x.
    + The __format__ probe failed incorrectly with PGI compilers.
  - Improved support for "old" C compilers by removing or reducing
    unnecessary C99 dependencies.

* General
  - Fixed an error in README's instructions for use of the Makefile
    fragments, which was missing "$(GASNET_CPPFLAGS)" in the ".c.o" rule.
  - Continued improvement to signal handling for smp-conduit for fewer
    "orphans" when an application is terminated by a signal or abort().
  - Implemented/documented GASNET_THREAD_STACK_{MIN,PAD} environment vars.
  - Annotated many (but not yet all) known-benign memory leaks in the
    report generated when GASNET_MALLOCFILE is set (for bug 3088).
  - Improved documentation (in code comments) for conduit implementers.

* Build and configure
  - Most instances of "debug vs. optimize compilation conflict" in the
    MPI compiler are now resolved without user intervention.
  - Support for GASNet's error-checking implementation of malloc and its
    associates can now be controlled independent of the --enable-debug
    configure option (but is still enabled by default when configured
    with --enable-debug).  This resolves bug 3089.
    + "--enable-debug-malloc" enables error-checking malloc while not
      enabling other runtime assertions associated with "--enable-debug".
    + "--enable-debug --disable-debug-malloc" yields a debug build without
      GASNet's error-checking wrappers for malloc and related calls.  This
      is useful when an external debugging malloc library is to be used.

----------------------------------------------------------------------
10-30-2011 : Release 1.18.0

* Cray-XE series (gemini-conduit):
  - This release includes a BETA of native support for the Cray XE network.
  - The performance of gemini-conduit has not been tuned, however we believe
    that the implementation is correct (with only minor bugs remaining).
  - This is still only a BETA-quality implementation, and significant
    performance improvements are anticipated in future releases.

* General
  - Implemented faster atomics for x86, x86-64 and PPC64.
  - Improved signal handling for smp-conduit for fewer "orphans" when
    an application is terminated by a signal.
  - Fix output corruption sometimes seen when redirecting stdout/stderr.
  - GASNET_TMPDIR env var to control placement of most temporary files.
  - Better support for systems where gethostid() returns 127.0.0.1.
  - Fixed PSHM-over-SYSV bug with non-contiguous process distributions.
  - Field remote_addr in gasnet_seginfo_t has been removed and the signature
    of gasnet_getNodeInfo() has changed.  If your GASNet client was using
    these undocumented interfaces, then it will need to be updated.

* Platform support/portability
  - Enabled partial backtrace support for Cray-XT and XE series systems.
  - Make allowances for odd sbrk() implementation on Darwin.
  - Probe for /dev/mmtimer on x86-64-based Altix platforms (bug 2880).
  - Improved support for systems lacking an atomic C-A-S (bug 3043).
  - Added work-arounds for Open64 and PathScale compiler bugs.
  - Fixed various warnings seen with recent gcc and icc versions.
  - Made corrections to MIPS support for "o32" ABI.
  - Extended ARM support to a wider range of ISA revisions.

* InfiniBand (ibv- and vapi-conduits):
  - Cleanup/simplify AM code for InfiniBand.
  - Ignore Mellanox HCA ports configured for Ethernet.

* Firehose dynamic-memory registration library (several conduits):
  - Fixed bug 2768: errors with firehose at node counts over 4096.
  - Reduced memory usage in firehose library

* Build and configure:
  - Provide Makefile fragments for GASNet-Tools clients (bugs 2565 and 2940).
  - Fixed problems with autoconf 2.64 and newer (bugs 2648 and 2748).
  - Now ship with updated config.guess and friends.

----------------------------------------------------------------------
10-17-2011 : Release 1.17.6
* A "stable snapshot" - first release candidate for 1.18.0

----------------------------------------------------------------------
09-23-2011 : Release 1.17.4 (gemini-conduit only beta release)

  - This is a Beta release featuring an initial native implementation
    for the Gemini interconnect of the Cray XE.
  - The performance of gemini-conduit has not been tuned, but we believe
    that the implementation is correct.
  - Relative to the previous stable release, 1.16.2, this Beta includes
    several miscellaneous changes not described here.  Most are fixes for
    bugs or improvements to performance, and none are suspected to make
    this release any less stable than 1.16.2.
  - This Beta has been mostly tested on Cray XE6 systems, but is not
    known or suspected to be less stable on any other specific platforms.

----------------------------------------------------------------------
05-18-2011 : Release 1.16.2 (feature and bug fix release)

* General:
  - Fixed bug 2951: exitcode=1 from smp-conduit under unusual conditions
  - Fixed an infrequent race in PHSM debugging code that caused rare crashes
  - Fixed minor bugs in the non-default AMCENTRAL barrier
  - Fixed many additional minor bug fixes and performance improvements

* InfiniBand (ibv- and vapi-conduits):
  - Fixed bug 2950: ibv-conduit page alignment problem on ia64
  - Improved InfiniBand scalability
    + This release adds support for the XRC extension to the InfiniBand
      specification which can greatly reduce the memory and HCA resource
      requirements for large node counts, when used together with SRQ.
      For more details on SRQ and XRC see vapi-conduit/README (source)
      or share/doc/gasnet/README-ibv (installed).
    + This release adds support for operating ibv- and vapi-conduits
      without connecting all pairs of nodes at startup (avoiding the
      associated costs in time, memory and HCA resources).
      For more information see vapi-conduit/README (source) or
      share/doc/gasnet/README-{ibv,vapi} (installed) for documentation
      on the GASNET_CONNECT_* family of environment variables.
    + Several additional reductions in memory use

* IBM SP (lapi-conduit):
  - Enable partial PSHM support when not using lapi-rdma
  - Link w/ "big TOC" by default

* Build and configure:
  - Improved configure support for AIX 6.x

----------------------------------------------------------------------
12-08-2010 : Release 1.16.1 (minor bug fix release)
 
* General:
  - Eliminated an infrequent race in an assertion that caused rare crashes.
  - Fixed a configure problem that would reject OSS12.2's sunCC.
  - Fixed bug 2927: PSHM breaks with greater than 255 processes.
  - Eliminated infinite recursion on some error exits in smp-conduit.
  - Additional small fixes in the collectives and PSHM

* Cray-XT series (portals-conduit):
  - Improved speed of job startup on large-memory nodes.

----------------------------------------------------------------------
11-01-2010 : Release 1.16.0
 
* General:
  - Environment vars to limit which nodes generate various outputs:
    + GASNET_BACKTRACE_NODES - limits GASNET_BACKTRACE output
    + GASNET_TRACENODES      - limits GASNET_TRACEFILE output
    + GASNET_STATSNODES      - limits GASNET_STATSFILE output
    + GASNET_MALLOCNODES     - limits GASNET_MALLOCFILE output

* InfiniBand (ibv-conduit):
  - This release features a (re)implementation of Active Messages for
    ibv-conduit via SRQ (Shared Receive Queue) which greatly reduces
    the memory requirements for large node counts.
  - Implementation now supports (in theory) as many as 65535 GASNet
    nodes (processes), up from 16384.

* Cray-XT and Cray-XE series:
  - Added support for PSHM (requires optional PSHM-over-SystemV)
  - Fixed bug 2435: portals-conduit assertion failures if signalled
  - gasnett_set_affinity() now implemented under CNL/CLE
  - Initial testing on XE series (w/ mpi-conduit, no native support)

* Process-Shared Memory (PSHM) Support
  - Now enabled by default on Linux
  - Enabling PSHM no longer disables conduits lacking PSHM support
  - Optional implementation via SystemV shared memory
  - Optional implementation via mmap()ed files
  - AMPoll operation now O(1), rather than O(procs_per_node)
  - Fix bug 2826: testhsl failures with PSHM + mpi-conduit

* Misc Platform support:
  - Fix bug 2530: bad addressing for 128-bit atomics on x86-64
  - Added gasnett_set_affinity() implementation for Solaris
  - Improved support for SGI Altix models w/ x86-64 CPUs including
    the ICE and UV family platforms.
  - Improved debugger support for MacOSX

* Build and configure:
  - Fix bug 2688: installing extraneous internal headers

----------------------------------------------------------------------
10-24-2010 : Release 1.15.8
* A "stable snapshot" - second release candidate for 1.16.0

----------------------------------------------------------------------
10-17-2010 : Release 1.15.6
* A "stable snapshot" - first release candidate for 1.16.0

----------------------------------------------------------------------
06-28-2010 : Release 1.15.4 (ibv-conduit only beta release)
  - This is a Beta release featuring an initial (re)implementation of
    Active Messages for ibv-conduit via SRQ (Shared Receive Queue).
  - SRQ is an InfiniBand API mechanism for more scalable memory usage as
    the number of connected peers increases.
  - In previous releases of ibv-conduit each additional peer required an
    additional GASNET_AM_CREDITS_PP buffers (32 by default) be allocated
    for receiving AM traffic.  At 4KB per buffer plus additional metadata
    for management, this would amount to about 133KB per peer.
  - The introduction of SRQ allows ibv-conduit to operate with no more
    than 1024 AM receive buffers (4MB + management overheads) independent
    of the number of peers, with little or no performance impact on well-
    behaved applications.
  - This initial implementation is known to deadlock under very rare AM-
    intensive workloads, or when certain settings are reduced to values
    much lower than their defaults.  This will be resolved in the next
    Beta, prior to the 1.16.0 release.
  - There is no SRQ implementation for vapi-conduit.
  - Relative to the previous stable release, 1.14.2, this Beta includes
    several miscellaneous changes not described here.  Most are fixes for
    bugs or improvements to performance, and none are suspected to make
    this release any less stable than 1.14.2.
  - This Beta has been mostly tested on ibv-conduit systems, but is not
    known or suspected to be less stable on any other specific platforms.

----------------------------------------------------------------------
05-20-2010 : Release 1.14.2

* General:
  - Much improved support for heterogeneous compilers (CC, CXX and MPI_CC)
  - Work-around for broken MALLOC_CHECK_ support on some glibc versions
  - Use MALLOC_OPTIONS variable on *BSD as we use MALLOC_CHECK_ on glibc
  - Fix parsing of GASNET_{FREEZE,BACKTRACE}_SIGNAL env vars

* InfiniBand (vapi- and ibv-conduits):
  - Fix bug 2079: stack overflow errors when vapi/ibv compiled with pgcc

* Cray-XT series (portals-conduit):
  - Improved reliability and scalability of job startup and termination code.
  - Fixed a corner-case bug in AM Medium code
  - Preliminary work to support PrgEnv-cray (requires CCE 7.2 or newer)

* IBM BlueGene/P (dcmf-conduit):
  - Fix bug 2756: PAR mode crashes with V1R4M0 drivers
  - Fix bug 2766: performance problem with loopback AM LongAsnyc
  - Fix bug 2781 and 2791: deadlocks with some uses of DCMF collectives
  - Conduit-level support for PSHM (some limitations due to BG/P platform)

* Experimental Process-Shared Memory (PSHM) Support
  - Shared-memory awarness added to default barrier implementations
  - Shared-memory awarness added to Extended API and Collectives

* Misc Platform support:
  - Fix bug 2685: timers broken on variable-frequency x86_64 CPUs
  - Resolve pthread link problems between Apple's and FSF's compilers
  - Preliminary work to support build with Open64 compilers from AMD
  - Preliminary work to support build with GCCFSS compilers from Sun

* Build and configure:
  - Allow client to control behavior on compiler-mismatch (eg for UPCR+GCCUPC)

----------------------------------------------------------------------
11-02-2009 : Release 1.14

* IBM BlueGene/P (dcmf-conduit):
  - Extend support to V1R4M0 driver release
  - Use native DCMF level collectives for several GASNet collectives
  - Implement more useful gasnett_gethostname() (previously gave I/O node name)
  - Minor fix for SEGMENT_EVERYTHING support

* Cray-XT series (portals-conduit):
  - Extended support to PE 2.1.42 and newer
  - Extended support to include PrgEnv-Intel
  - Implement more useful gasnett_gethostname() under Catamount
  - Spawner defaults to node count given in batch submission when no -N passed
  - Spawner improvements to deal intelligently with thread/process pinning
  - Misc. performance and scalability improvements
  - Several bugs fixed
    
* IBM SP (lapi-conduit):
  - Cleanup tentative definitions to eliminate excessive AIX linker warnings
  - Implement AIX-specific code for gasnett_set_affinity()
  - Several bugs fixed

* InfiniBand (vapi- and ibv-conduits):
  - Correct non-compliant use of offsetof() that broke compilation w/ XLC
  - Fixes for anomalous performance on ConnectX HCAs (Mellanox MT25418)
  - Improved performance (and correctness) with segments 2GB and larger
  - Documented settings to work-around failures seen w/ InfiniPath HCAs
    see vapi-conduit/README (source) or share/doc/gasnet/README-ibv (installed)
  - Multiple bugs fixed

* Misc Platform support:
  - Fix mis-aligned use of x86-64 cmpxchg16b instruction
  - Atomics work-around for SiCortex ICE9A processor errata
  - Fixes for aggressive alias analysis in gcc-4.4.x
  - Improved support for XLC on all platforms
  - Improved debug info and warning messages with PathScale compilers
  - Improved gcc TLS support on IA64

* General:
  - Experimental shared memory support (see README and pshm-design.txt)
  - Experimental collective autotuner (see README and autotune.txt)
  - Additional collective algorithms implemented
  - Fixes to some tests for large message sizes or large iteration counts
  - Work around sometimes broken UTF-8 support in perl
  - Improved support for clients with dynamic thread creation
  - Several minor bug fixes in conduit-independent code

* Build and configure:
  - Clean up public headers to enable use of -Wstrict-prototypes by clients
  - More accurate conduit auto-detection (eliminating false-positives)
  - Allow disabling of conduit auto-detection
  - Updates to configure for more recent GNU autotools
  - Better default mpi-conduit configuration on SGI Altix and IRIX
  - Correction to mechanism for detecting an SMP host under FreeBSD

----------------------------------------------------------------------
11-03-2008 : Release 1.12

* New conduits added:
  - dcmf-conduit: High-performance conduit for the IBM BlueGene/P
    using the DCMF communication interface.

* IBM SP/LAPI:
  - Fix a bug that prevented the use of unequal segment sizes across 
    nodes in LAPI-RDMA mode
  - Fix several exit-time crashes
  - Remove deprecated support for Federation LAPI version < 2.3.2.0
  - Lots of misc cleanups and tuning

* Myrinet/GM:
  - Fix some AM performance and correctness problems, esp with AMLong

* CrayXT/Portals:
  - Upgrade to cache local memory registration using firehose library
  - Add GASNET_PORTAL_PUTGET_BOUNCE_LIMIT setting

* InfiniBand/{VAPI,IBV}:
  - Extend "ibv" (InfiniBand) support to Qlogic's InfiniPath adapters

* Platform support:
  - Add support for the BlueGene/P architecture (mpi and dcmf)
  - Add experimental support for ARM processors
  - Add support for PGI compiler on Mac OSX
  - Misc improvements and/or fixes for MIPS, Alpha, PPC and SPARC processors
  - Add Pathscale compilers to supported list for Cray XT machines
  - Improved support for XLC compilers on Linux
  - Add/improve support for MIPSEL/Linux platforms, including SiCortex
  - Add support for the default libpthread on Cray XT CNL 2.1
  - Add support for Playstation 3 PowerPC

* Configure features:
  - Add --disable-mmap support to force the use of malloc for the GASNet segment
  - Add configure option --with-max-pthreads-per-node=N to override the 
    GASNet default limit of 256 pthreads per node
  - Add support for autoconf 2.62 and newer
  - Workaround stability problems in cygwin pthread mutexes (bug 1847)

* GASNet tools:
  - Upgrades to error reporting in the GASNet debug mallocator
  - Add GASNET_MALLOCFILE option and corresponding gasnet_trace support
    to assist in leak detection for libgasnet and apps using debug mallocator
  - Add "strong" atomics to the GASNet-tools interface
  - New gasnett_performance_warning_str() returns a string reporting
    performance-relevant attributes of the current GASNet build

* Misc changes:
  - Workaround for a gcc 4.x (x<3) optimizer bug has changed
      We now encourage updating to gcc >= 4.3.0, though our previously
      documented workarounds remain valid
  - Minor improvements to the collectives environmental interface
  - Fix cross-configure detection of stack growth direction
  - Avoid "capturing" __attribute__ when compiler mismatch is detected

----------------------------------------------------------------------
10-30-2007 : Release 1.10

* IBM SP/LAPI:
  - Upgraded lapi-conduit to use RDMA support on LAPI/Federation systems,
    when available. This provides improved communication performance.

* Myrinet/GM:
  - Fix a race that could result in lost payload data for heavy AM Long
    communication in the presence of multiple client threads.

* CrayXT/Portals:
  - workaround a thread-safety bug in CNL Portals that could result in 
    crashes for AM-heavy workloads

* InfiniBand/{VAPI,IBV}:
  - Expose env vars to manipulate hardware-level retransmission parameters.

* Collectives:
  - Added an initial high-performance implementation of the GASNet 
    collectives. This provides scalable implementations of all the 
    data movement collectives, implemented over Active Messages.

* Misc changes:
  - Improved checking for randomized Linux VM spaces, which inhibit
    the ability to provide GASNET_ALIGNED_SEGMENTS
  - Numerous bug fixes, see http://upc-bugs.lbl.gov for details

----------------------------------------------------------------------
09-13-2007 : Release 1.9.6 (Cray XT only beta release)

* CrayXT/Portals:
  - portals-conduit is now a fully-native implementation, no longer relies
    on any MPI calls
  - support has been added for pthreads on compute-node Linux 
  - fixes to automatically workaround known problems in various PE versions
  - removed the 100 MB limit for SEGMENT_FAST on CNL

* Ethernet/UDP:
  - now supports up to 16K nodes (although buffer utilization remains non-scalable)
  - fix an exit race that could cause some trailing output to be lost

* InfiniBand/{VAPI,IBV}:
  - AM-over-RDMA optimization for small AMs now enabled by default

* Misc changes:  
  - Add node placement support for various job spawners
  - Fix a crash in gasnett_threadkey for C++ clients

----------------------------------------------------------------------
02-01-2007 : Release 1.9.2 (Cray XT3 only beta release)

* New conduits added:
  - ibv-conduit: High-performance conduit using the OpenIB communication
    interface on InfiniBand hardware.

* New platform support:
  - New ports: CrayXT/Linux, K42/PPC64, OpenBSD/x86, SunC/Linux

* Misc changes:  
  - Add backtrace extensibility to GASNet tools
  - Add new features GASNET_FREEZE_SIGNAL and GASNET_BACKTRACE_SIGNAL
    which allow a user to asynchronously freeze a process or print a backtrace
  - Many, many bug fixes, for both specific conduits and general platform
    portability. See http://upc-bugs.lbl.gov for complete details.    

* InfiniBand/VAPI:
  - New AM-over-RDMA optimization significantly improves performance of small AMs

* CrayXT/Portals:
  - portals-conduit now works with PrgEnv-PGI, starting with Cray PE 1.5
  - support has been added for compute-node Linux 

----------------------------------------------------------------------
11-02-2006 : Release 1.8

* New conduits added:
  - portals-conduit: High-performance conduit using the Portals communication
    interface on the Cray XT-3. Initial implementation uses MPI-based active 
    messages and a Portals-based extended API.

* New platform support:
  - New ports: MacOSX/x86, MacOSX/PPC64, Cray XD1 and ucLinux/MicroBlaze

* Misc changes:  
  - Add --help option to all GASNet tests
  - Add internal diagnostic tests
  - Add progress functions
  - Add --disable-aligned-segments configure flag for clusters with disaligned VM
  - Fix ansi-aliasing violations on small local put/get copies
  - Default to allocate-first-touch for segment mmap on Linux and Solaris
  - Many performance and functionality improvements to the GASNet collectives
  - Move most config-related defines off compile line into gasnet_config.h
  - Reorganize source files for faster and more robust builds
  - Barrier algorithm can now be selected at runtime using GASNET_BARRIER 
  - Standardize and simplify our preprocessor platform detection logic system-wide
  - Many, many bug fixes, for both specific conduits and general platform
    portability. See http://upc-bugs.lbl.gov for complete details.    

* GASNet tools support:
  - Add a conduit-independent library implementing the GASNet portability tools - 
    which include portable high-performance timers, atomic operations, memory barriers,
    C compiler annotations, uniform platform identification macros, reliable 
    fixed-width integer types, thread-specific data, and other misc tools.
  - Add Portable Linux Processor Affinity (PLPA) library for gasnett_set_affinity
  - Implement automatic backtrace generation on crash for several popular debuggers
  - Change default timer granularity to nanoseconds, adding _ticks_to_ns()
  - Add __thread (TLS) implementations of gasnett_threadkey

* Expanded local atomic operations support:
  - Add native support for additional compilers, notably including many C++ compilers
  - Add fetch-and-add and fetch-and-subtract operations
  - Add 32-bit and 64-bit fixed-width atomic types
  - Add explicit control of memory fence behavior
  - Add constants defining the range of the atomic type
  - Add uniform support for use of the atomic type for signed values

* General performance improvements:
  - split-phase barriers on most conduits now make progress during any GASNet call
  - initial packing implementations of the GASNet non-contiguous (vector, indexed,
    and strided) put/get functions (currently off by default)

* InfiniBand/VAPI:
  - Implement multi-port and multi-rail striping support
  - Improvements to firehose region management heuristics
  - VAPI recv thread is now disabled by default (but still available via env setting)

* MPI:
  - Significant performance and stability improvements on mpi-conduit,
    especially on systems where the MPI-level flow control is lacking or
    unreliable (eg XT-3, BGL).
  - Split request/reply traffic onto separate MPI communicators to ensure
    bounded AMMPI-level buffer space utilization, even for degenerate cases
  - Added an AMMPI-level token-based flow control solution to prevent the
    crashes observed under heavy MPI unexpected message loads on various
    systems (XT3, Altix)
  - Add workaround for an IBM MPI ordering bug that could cause deadlock
    under heavy communication patterns.
  - Other misc tuning along the primary control paths and new tuning knobs

* Ethernet/UDP:
  - Add cross-platform spawn support for cross-compiled targets

* GASNet spec 1.8:
  - expose the GASNet release version as public macros:
    GASNET_RELEASE_VERSION_MAJOR/GASNET_RELEASE_VERSION_MINOR/GASNET_RELEASE_VERSION_PATCH
  - deprecate GASNET_VERSION in favor of GASNET_SPEC_VERSION_MAJOR/GASNET_SPEC_VERSION_MINOR
  - minor wording clarifications

----------------------------------------------------------------------
08-20-2005 : Release 1.6

* New conduits added:
  - shmem-conduit: High-performance conduit using the shmem communication
    interface on Cray X1 and SGI Altix. May support targeting other shmem 
    implementations in the future.

* New platform support:
    - Add cross-compilation support, specifically including the Cray X-1
    - Experimental support for the Cray XT3 and IBM Blue Gene/L (contact us
      for details)
    - Other new ports: Linux/PowerPC, Cray MTA, NetBSD/x86, Linux/Alpha, 
        FreeBSD/Alpha, HPUX/Itanium, PathScale & Portland Group compilers
    - Linux 2.6 kernel support for gm, vapi, shmem
    
* General performance improvements:
    - Replace default barrier implementation on gm, vapi, sci, mpi, udp with a 
      more scalable barrier implementation.
    - System-wide performance improvements to AM's
    - Improve the performance and functionality of gasnet_trace
                          
* Misc changes:  
    - Output improvements to gasnet tests
    - Added MPI performance tests to the GASNet tests for ease of comparison
    - Many robustness improvements to job spawning on various conduits and systems
    - New environment variable GASNET_VERBOSEENV turns on global reporting of 
      all environment variables in use      
    - Improve the robustness and quality of GASNet's automatic heap corruption detection 
    - Many, many bug fixes, for both specific conduits and general platform
      portability. See http://upc-bugs.lbl.gov for complete details.    
      
* Myrinet/GM:
    - gm-conduit now provides interoperability with MPI.
    - add support for spawning with mpiexec
    - several robustness and stability improvements
    
* InfiniBand/VAPI:
    - Use firehose to manage local pinning in SEG_FAST, for performance
    - Add a stand-alone ssh-based spawner, and MPI is no longer 
      required to build vapi-conduit.
    - Numerous performance improvements, especially for AM's, non-bulk puts
      and large put/gets (>128KB)
    - Improve firehose region efficiency, improving performance on LARGE/EVERYTHING
    - Add support for striping and multiplexing communication over multiple 
      queue pairs
    - Add options for controlling the vapi progress thread
    
* IBM SP/LAPI:
    - Change the default GASNET_LAPI_MODE to POLLING, which vastly 
      outperforms INTERRUPT on Power4/Federation
    - Significant performance improvements to barrier
    
* Quadrics/ELAN:
    - Elan4 functionality and tuning work
    - add support for SLURM spawner
    - Improve queue depth, allowing more non-blocking put/gets to be posted without stalling
    
* CrayX1 & SGI Altix/SHMEM:
    - Significant performance improvements to AM's
    - Many correctness fixes to put/gets and AM's
    
* Ethernet/UDP:
    - Improve the performance of loopback AM's
    
----------------------------------------------------------------------
08-27-2004 : Release 1.4

* New conduits added:
  - udp-conduit: a portable conduit that implements GASNet over any standard
    TCP/IP stack. This is the now the recommended conduit for clusters with
    only ethernet networking hardware (faster than mpi-conduit over TCP-based MPI).
    See udp-conduit/README for important info on job spawning. Note that
    udp-conduit requires a working C++ compiler (but when none is available, it can
    be disabled with --disable-udp).
  - sci-conduit: an experimental conduit over Dolphin-SCI. Current
    implementation is core-only, performance improvements are on the way in the
    next version.

* GASNet2 extended API interface extensions:
  - Implement reference version of GASNet collective operations
  - Implement reference version of GASNet vector/indexed/strided put/get operations
  - updated GASNet 2.0 spec to be released soon
 
* GASNet Spec v1.6: 
  - Add gasnet_hsl_trylock()
  - Specify calls to gasnet_hold_interrupts() and gasnet_resume_interrupts()
    are ignored while holding an HSL.
  - Clarify the upper limit of in-flight non-blocking operations is 2^16-1
  - Clarify gasnet_handle_t is a scalar type
  - Small clarifications and minor editorial corrections

* gm-conduit:
  - fix thread-safety problems in firehose library that caused stability
    problems in GASNET_PAR mode
  - detect versions of GM driver with broken RDMA get support and don't use it
    there
  - remove dependency on gethostbyname to improve reliability of static linking
    on Linux
  - improvements to gasnetrun-gm

* vapi-conduit:
  - add SEGMENT_LARGE and SEGMENT_EVERYTHING support
  - many performance improvements

* lapi-conduit:
  - add workaround for a recent LAPI performance bug on Federation hardware
  - gasnet_exit stability improvements

* elan-conduit:
  - upgrades for recent libelan versions

* Configure changes:
  - add autodetection of all conduits, whenever possible. On some systems one
    may still need to  set some environment variables before running configure
    to indicate the install location of network drivers.
  - detect and reject the buggy gcc 3.2.0-2 compilers
  - handle systems lacking pthreads
  - improved sanity checks for MPI_CFLAGS

* Makefile changes
  - Add a set of manual-overrides for compilation of the GASNet libraries and
    tests, ie "make MANUAL_LIBCFLAGS=..." - see README
  - Fix "gmake prefix=/new/path install" to work correctly, even when it
    differs from configure-time prefix
  - Add limited support for parallel make (not recommended for general use)

* GASNet infrastructure ported to Cray X1, AMD Athlon/Opteron, Sun Pro C, HP C

* Add gasnet_trace contributed tool, which automatically parses and summarizes
  GASNet trace files

* Add an experimental spin-poll throttling feature to reduce lock contention
  for GASNET_PAR mode, configure --enable-throttle-poll

* Restructure use of local memory barriers to accommodate architectures
  requiring read memory barriers

* Fix GASNet headers to be C++ friendly

* Many miscellaneous performance, stability and functionality improvements

----------------------------------------------------------------------
11-10-2003 : Release 1.3

* Added InfiniBand support in vapi-conduit - currently only SEGMENT_FAST is supported

* elan-conduit: 
  - updated for the most recent version of libelan
  - fix a few race conditions

* gm-conduit:
  - updated for GM 2.0, including RDMA get support
  - Added 64-bit support
  - Reworked the spawner to work with mpiexec, gexec, MPICH mpirun and a custom spawner

* lapi-conduit:
  - Fix bugs related to varying LAPI uhdr payload size across systems - this is
    now queried automatically at runtime

* GASNet spec: 
  - gasnet_hold_interrupts() and gasnet_resume_interrupts() calls are now
    required to be ignored by the implementation within an AM handler context.
  - Added gasnet_set_waitmode() function

* Add a logGP test program for GASNet conduits

* Add a threaded tester for gasnet threaded clients

* Added a GASNet/MPI test that tests the compatibility of a GASNet conduit with
  MPI calls made by the GASNet client. 

* All GASNet conduits other than gm are now fully compatible with limited MPI
  calls from the GASNet client code. In order to prevent deadlock and ensure
  safety, GASNet and MPI communication should be separated by barriers.

* Factor the firehose page registration system into a new, separate firehose
  library with a public interface, for use by gm-conduit and vapi-conduit
 
* Use "adaptive" pthreads mutexes on Linux (when available), for better SMP performance

* Added support for new platforms: Solaris-SPARC 64-bit 
  and new compilers:  Portland Group C, SunPro C and Intel C

* Add SIGCONT as an additional option for unfreezing a GASNet application
  This is a useful option for debugging GASNet apps which lack debugging symbols
  (but may still have enough info to give you a stack trace, etc) A GASNet app
  frozen by GASNET_FREEZE can now be unfrozen by sending: "kill -CONT pid"
  to each process, or on some systems by typing control-Z on the console to
  suspend the process and then fg to resume it (sends a SIGCONT for you).

* HSL calls now compile away to nothing when HSL's are unnecessary 

* Merged AMMPI v0.8, includes fixes to rare buffer overflows and small memory leaks

* fixed pthread barrier errors caused by a race condition

* Minor semantic change to no-interrupt sections -
  gasnet_{hold,resume}_interrupts() are now officially ignored within a GASNet
  handler context (where interrupts are already suspended anyhow). 

* add new function gasnet_set_waitmode() to control waiting behavior

* Use an atexit handler to make sure we finalize the trace/stats file, even 
  if the client exits without calling gasnet_exit

* Fixes to gasneti_local_membar(), especially for SMP/UNI Linux kernels and PowerPC

* New significant GASNet conduit programming practices:
  gasneti_{malloc,calloc,free}, gasneti_assert, GASNETI_CLIENT_THREADS,
  GASNETI_CONDUIT_THREADS, (N)DEBUG -> GASNET_(N)DEBUG, STATS,TRACE ->
  GASNET_{STATS,TRACE}

* Many minor fixes

----------------------------------------------------------------------
06-28-2003 : Release 1.2

* Greatly increased the number of platforms supported - notably, this release
  adds support for FreeBSD, IRIX, HPUX, Solaris, MSWindows-Cygwin and Mac OSX,
  as well as the SunPro, MIPSPro, Portland Group and Intel C compilers. See the
  top-level README for the complete list of supported platforms.

* Added the smp-conduit, which implements pure loopback to support GASNet
  clients on platforms lacking a network.

* Remove 256-node scalability limit - mpi, elan and lapi conduits now
  theoretically scale to 2^31 nodes. gm conduit scales to 2^16 nodes.

* Merge v0.7 of AMMPI - improved latency performance, better scalability, and
  fixes for LAM/MPI

* Fix bug 120 - gasnet_exit now reliably kills the entire job on all conduits
  in various collective and non-collective invocation situations.

* New switches GASNETE_PUTGET_ALWAYSLOCAL and GASNETE_PUTGET_ALWAYSREMOTE which
  optimize away the locality check for put/gets implemented by
  gasnete_islocal() 

* Updates to the tracing system - separate statistics from tracing to allow
  finer user control controlled by new environment variables - GASNET_STATSMASK
  and GASNET_STATSFILE

* Major cleanup to the gm-conduit bootstrap code

* Internal structural changes to gasnet_extended.h to provide more flexibility
  for conduit overrides

* Minor wording clarifications to the GASNet spec

* Many minor bug fixes

----------------------------------------------------------------------
04-17-2003 : Release 1.1

* Added lots of conduit user and design documentation
* Fix bugs with gasnet_register_value_t functionality, in some cases garbage 
  was returned by gasnet_get_val() in the upper bytes
* Fix bug 51 - endianness bugs on gasnet_*_val()
* Tweak the gcc optimizer settings to ensure that we get full inlining
* Ensure gasnet_exit() or fatal signals always correctly shut down the global 
  job (mpi and elan conduits - gm and lapi still have known problems)
* Add strong configure warnings about using gcc 2.96 - users are highly 
  recommended to avoid this broken compiler
* Ensure configure caching is always on
* Basic infrastructure cleanups to the conduit Makefile fragments
* Fix a shutdown-time crash when tracing
* Add GASNET_CONFIG_STRING to spec & implementation and embed it in library
* Add a number of minor clarifications to the GASNet spec
* Clean up licensing issues

* elan-conduit: 
 - fixups for better handling of elan memory exhaustion
 - preallocate AMLong bounce buffers

* gm-conduit:
 - various stability fixes
 - add spawning scripts for gexec and pbs

* mpi-conduit:
 - add global environment variable exchange to ensure consistent
   gasnet_getenv() results across nodes
 - merge AMMPI release 0.6

----------------------------------------------------------------------
01-29-2003 : Initial Release (1.0)

