This file contains release notes for major and minor releases of xpar.
For a complete list of source-level changes, consult the ChangeLog file.

===============================================================================
v1.1 (20-04-2026)
- File format is unchanged from v1.0 in all modes; v1.0 archives
  decode under v1.1 and v1.1 archives decode under v1.0.
- New '--progress' flag periodically reports throughput and ETA to
  stderr during encode and decode in joint, systematic, Leopard-
  sharded, and Vandermonde-sharded modes.
- Optional io_uring backend on Linux for batched parallel I/O in
  joint, Leopard-sharded, and Vandermonde-sharded pipelines.
  Enabled at configure time with '--enable-liburing' (auto-detected
  on Linux when liburing is installed); other platforms and builds
  without liburing continue to use the portable POSIX/Win32 host
  backend unchanged.
- Joint-mode encoder refuses to overwrite an existing output file
  unless '-f / --force' is given, matching the other modes.
- Keyed MAC and tag comparisons are constant-time.
- Sharded decoders share a common consensus-voting path with bounded
  vote tables and stricter layout checks; end-of-stream handling in
  Leopard- and Vandermonde-sharded decode is hardened against
  truncated or mismatched final shards.
- Leopard-sharded: correct handling of the r->data == 1 single-data-
  shard edge case; decoder now fsync's and close-checks output before
  reporting success. Vandermonde-sharded: fix a shard buffer leak on
  the error path. Sharded modes report a more meaningful error when
  the shard count is out of range.
- yarg command-line parser hardened against allocation-failure and
  formatting edge cases. Windows legacy (CP_ACP) command-line parser
  rewritten to grow its buffers safely and free partial state on OOM.
- POSIX host: FD_CLOEXEC on opened descriptors; read/write loops
  retry on short transfers and EINTR. DJGPP host: higher-precision
  PIT timer and a switch to <io.h> for low-level I/O; DOS builds are
  now LTO'd and benefit from -ffunction-sections / --gc-sections
  size trimming in release builds.
- Release tarballs have reproducible mtimes; CI adds Win95 and DOS
  release workflows alongside the existing targets.

===============================================================================
v1.0 (16-04-2026)
- File-format bump to v1.0. v0.x archives are rejected; re-encode any data
  you still need. Future v1.x minors stay decodable by v1.0+ tools.
- Joint header now records total input size and each lace carries a
  sequence number, so a .xpa truncated at a lace boundary or with laces
  reordered is detected and refused rather than silently decoding to
  garbage.
- Sharded header CRC32C now covers the version, shard count, shard
  number, and total size fields in addition to the body, so a
  shard_number bit-flip or a header-field swap between shards is
  detected.
- New '-H / --integrity' flag selects the per-lace / per-shard tag
  algorithm: 'crc32c' (default, 32-bit hardware-accelerated) or
  'blake2b' (128-bit BLAKE2b, 2^64 birthday-bound collision resistance
  instead of 2^16). Supported in joint, systematic, and sharded
  (Vandermonde + Leopard) modes.
- New '--auth=<keyfile>' flag reads a 1-64 byte key and switches the
  tag to a keyed BLAKE2b-128 MAC, implying '-H blake2b'. Decoders
  require the same key; wrong / missing / spurious keys are all
  rejected.
- New '-s / --systematic' joint mode: encoder writes a parity-only .xpa
  (about 18% of input) and the original file stays on disk unchanged.
  Decoder reads the (possibly corrupted) original plus the parity file
  and emits the corrected data to stdout. Incompatible with
  --interlacing.
- New '-t / --test' integrity-check mode for '-J', '-Js', '-W', and
  '-L'. Runs the full correction pipeline but writes nothing; exits
  non-zero on any unrecoverable block, shard shortfall, or tag
  mismatch, and also flags pristine-but-corrected archives.
- BLAKE2b has hand-vectorised SSE4.1 (also used on AArch64 via
  sse2neon) and AVX2 kernels with runtime CPUID dispatch; a portable
  reference C implementation is the fallback. '--disable-blake2b' at
  configure time strips BLAKE2b and MAC support entirely.
- Leopard-sharded decode is faster on aarch64 (Linux and Apple Silicon)
  via sse2neon. Fix a missing return in the SSSE3 xor_mem4 path that
  caused a latent heap overflow (surfaced as a segfault on Apple
  Silicon, caught by ASan in CI) when reconstructing from missing
  shards.
- Self-check now covers the new features, including truncation-,
  reorder-, and shard-number-swap rejection; the error-injection
  helper is rewritten in C so Python is no longer required to run the
  suite.
- CI builds additionally run under Address and Undefined-Behaviour
  sanitizers on Linux and macOS, catching regressions in the hot
  decoder paths. Release artifacts now include a Win95-target i686
  binary (xpar-i686-w95.exe, built with --with-windows-target=win95,
  imports only KERNEL32.dll) and a DJGPP/MS-DOS binary (xpar-dos.exe,
  CWSDPMI bundled via CWSDSTUB) alongside the existing i686/x86_64
  Windows binaries.

===============================================================================
v0.7 (19-09-2025)
- Rename the sharded mode to Vandermonde-sharded mode (-S => -W).
- Update building instructions for specific architecture and operating system
  combinations.
- Include benchmarks in the repository.
- Introduce minimum shard size.
- Improve stability without --no-mmap.
- Add FFT-based Reed-Solomon encoders and decoders that operate in
  linearithmic time (@catid).
- Fix a memory leak in gf256mat_inv used by the Vandermonde-sharded mode.
- Update and reflow the man-page.

===============================================================================
v0.6 (10-09-2025)
- Move the project page to iczelia/xpar.
- Minor style changes and fixes to command parsing.

===============================================================================
v0.5 (17-10-2024)
- OpenMP support for sharded mode (which unfortunately seems bottlenecked by
  I/O). 
- Switch to yarg for command-line parsing, remove dependency on Rich Felker's
  `getopt_long`.
- Hopefully the last v0.x release. Hopefully, it will receive some feedback
  which will help to introduce future improvements and release v1.0. The file
  format will not change from now on, unless there is a bug or another major
  misfeature that needs to be fixed.

===============================================================================
v0.4 (16-10-2024)
- x86_64 static Linux binaries are no longer provided.
- OpenMP support has been added to improve encoding and decoding performance
  in joint mode with high interlacing factors on multi-core machines.
- 3-way saturating CRC32C implementation has been added to improve performance
  on x86_64 machines that support SSE4.2.
- Slightly improve the performance of the sharded mode.
- Fix undefined behaviour in sharded mode regarding int shifts.

===============================================================================
v0.3 (16-10-2024)
- Improve joint encoding performance on x86_64 machines.
- Support aarch64 Linux.
- Improve cross-platform compatibility of the sharded mode.

===============================================================================
v0.2 (15-10-2024)
- Provides a manual page for the xpar command.
- Provides platform-specific code for aarch64, which can be enabled via
  the --enable-aarch64 configure option.

===============================================================================
v0.1 (14-10-2024)
- Initial release.
- Supports joint mode and sharded mode for error and erasure correction.
- Provides platform-specific code for x86_64, which can be enabled via
  the --enable-x86_64 configure option.
- Tested on x86_64 Linux (Ubuntu), x86_64 and aarch64 MacOS and x86_64 and
  i686 Windows.
