Total Size Of Requested Files Is Too Large For Zip-on-the-fly <LEGIT ✓>
for (const file of largeFileList) archive.append(createReadStream(file.path), name: file.name );
Pre-scan each file to compute CRC32 and size without storing the compressed data. Then write ZIP entries in a single sequential pass using HTTP chunked encoding. for (const file of largeFileList) archive
Use ZIP’s "store" method (deflation level 0). The CRC and size are known per file before writing. The CRC and size are known per file before writing
@shared_task(bind=True) def generate_large_zip(self, file_paths, job_id): temp_zip = f"/tmp/job_id.zip" with zipfile.ZipFile(temp_zip, 'w', zipfile.ZIP_DEFLATED, allowZip64=True) as zf: for path in file_paths: zf.write(path, os.path.basename(path)) # Upload to S3 s3.upload_file(temp_zip, "my-bucket", f"zips/job_id.zip") return f"https://my-bucket.s3.amazonaws.com/zips/job_id.zip" | Approach | Max ZIP size (practical) | Memory usage | HTTP timeout risk | Client experience | | :--- | :--- | :--- | :--- | :--- | | Naive (buffer) | < 200 MB | O(Size) | High | Immediate fail | | Streamed store | Unlimited* | < 20 MB | Medium (long download) | Progress bar works | | Chunked deflate | Unlimited* | < 100 MB | Medium | Same as above | | Async job | Unlimited (TB) | < 500 MB (worker) | None | Polling required | once for data).
plus per-file chunk buffers. Time: 2x I/O per file (once for CRC, once for data). 4.3 Level 3: Asynchronous Job-Based Packaging Best for: Extremely large requests (>50GB), slow storage, or unreliable networks.
The central directory is the key: a ZIP file’s table of contents is at the end of the file. Most libraries cannot stream it without first knowing all file sizes and CRCs. 4.1 Level 1: Streamed Passthrough (No Compression – "Store" Method) Best for: Already compressed files (JPEG, MP4, PDFs).
(only per-file read buffer). Limitation: Output size ≈ sum of input sizes. Still fails if Content-Length cannot be precomputed. 4.2 Level 2: Chunked Deflate with CRC Precomputation Best for: Text files, logs, or data that needs compression but cannot fit in memory.
