# Chunking System

The chunking system enables long-running operations to execute across multiple HTTP requests, handling PHP timeout limits and memory constraints. It provides resumable processing with crash recovery capabilities.

## Overview

Operations like package scanning, archive creation, and storage uploads can process thousands of files over several minutes. The chunking system:

- **Splits work into iterations** that fit within PHP execution limits
- **Persists progress** between HTTP requests for seamless resume
- **Detects incomplete processing** for crash recovery
- **Retries failed saves** to handle transient storage errors

## Core Components

### ChunkingManager

Abstract base class that orchestrates chunked processing.

**Namespace:** `Duplicator\Libs\Chunking`

**Key Features:**
- Controls iteration flow with configurable limits (`maxIteration`, `timeOut`, `throttling`)
- Manages start/stop lifecycle with automatic state persistence
- Provides retry logic for persistence saves (3 attempts with 50ms delay)
- Tracks processing state for crash detection

**Return States:**
| Constant | Value | Meaning |
|----------|-------|---------|
| `CHUNK_COMPLETE` | 0 | All items processed successfully |
| `CHUNK_STOP` | 1 | Stopped due to iteration limit or timeout (will resume) |
| `CHUNK_ERROR` | -1 | Fatal error occurred |

### PersistanceAdapterInterface

Contract for persistence implementations.

**Namespace:** `Duplicator\Libs\Chunking\Persistance`

**Methods:**
- `getPersistanceData()` - Load saved position for resume
- `savePersistanceData($position, $iterator)` - Save current progress
- `deletePersistanceData()` - Clear saved state on completion
- `setProcessing(bool)` - Mark chunk as in-progress
- `isProcessing()` - Check if previous run was incomplete

### AbstractPersistanceAdapter

Base implementation with built-in caching layer.

**Namespace:** `Duplicator\Libs\Chunking\Persistance`

**Key Features:**
- **Caching:** Loads data once and caches in memory, preventing redundant I/O
- **Processing flag:** Tracks incomplete executions for crash recovery
- **Extra data storage:** Allows adapters to persist custom state alongside position
- **Hook methods:** `afterLoadPersistanceData()` and `beforeWritePersistanceData()` for customization

**Data Structure:**
```php
[
    'isProcessing' => bool,   // true while chunk is running
    'extraData'    => mixed,  // adapter-specific state
    'position'     => mixed,  // iterator position for resume
]
```

## Persistence Adapters

### FileJsonPersistanceAdapter

Stores progress in a JSON file on the filesystem.

**Use case:** Package scanning, local operations where file storage is reliable.

### NoPersistanceAdapter

Null implementation that doesn't persist state.

**Use case:** Operations that complete in a single request or don't need resume.

### UploadPackageFilePersistanceAdapter

Stores progress in the package database record via `UploadInfo`.

**Use case:** Storage uploads where progress must survive across requests and be tied to package state.

### ScanPersistanceAdapter

Extends `FileJsonPersistanceAdapter` with scan-specific extra data.

**Use case:** Package scanning phase, persists scan results alongside position.

## Implementing a ChunkingManager

To create a chunked operation:

```php
class MyChunkingManager extends ChunkingManager
{
    protected function getIterator($extraData = null)
    {
        // Return a GenericSeekableIteratorInterface
        return new MySeekableIterator($extraData);
    }

    protected function getPersistance($extraData = null)
    {
        // Return a PersistanceAdapterInterface
        return new FileJsonPersistanceAdapter('/path/to/state.json');
    }

    protected function action($key, $current)
    {
        // Process single item, return true on success
        return $this->processItem($current);
    }
}
```

**Usage:**
```php
$manager = new MyChunkingManager($data, $maxIteration, $timeout);

// First request: start fresh
$result = $manager->start(true);

// Subsequent requests: resume from saved position
$result = $manager->start(false);

if ($result === ChunkingManager::CHUNK_COMPLETE) {
    // All done
} elseif ($result === ChunkingManager::CHUNK_STOP) {
    // Schedule next chunk
} else {
    // Handle error
    $error = $manager->getLastErrorMessage();
}
```

## Implementing a Persistence Adapter

Extend `AbstractPersistanceAdapter` and implement three protected methods:

```php
class MyPersistanceAdapter extends AbstractPersistanceAdapter
{
    protected function loadPersistanceData()
    {
        // Load and return data array, or null if none exists
        return $this->loadFromStorage();
    }

    protected function writePersistanceData($data): bool
    {
        // Save data array, return true on success
        return $this->saveToStorage($data);
    }

    protected function doDeletePersistanceData(): bool
    {
        // Delete stored data, return true on success
        return $this->deleteFromStorage();
    }
}
```

**Optional hooks:**
```php
protected function afterLoadPersistanceData()
{
    // Called after successful load - import extra data
    $extra = $this->getExtraData();
    $this->myState = $extra['myState'] ?? null;
}

protected function beforeWritePersistanceData($position, $iterator)
{
    // Called before write - export extra data
    $this->setExtraData(['myState' => $this->myState]);
}
```

## Processing Flow

### Normal Execution

1. `start(true)` called - rewinds iterator, deletes old state
2. `setProcessing(true)` saved to persistence
3. Iterator processes items until limit/timeout
4. `stop()` called - `setProcessing(false)` saved with final position
5. Returns `CHUNK_STOP` or `CHUNK_COMPLETE`

### Resume Execution

1. `start(false)` called - loads saved position
2. `gSeek(position)` restores iterator state
3. `next()` advances past last processed item
4. Processing continues from where it left off

### Crash Recovery

1. Previous run crashed (process killed, timeout, etc.)
2. `isProcessing` flag remains `true` in storage
3. New request calls `wasProcessingIncomplete()` - returns `true`
4. Application can decide: retry from last position or restart

## Retry Logic

The `saveData()` method includes retry logic for transient failures:

- **Attempts:** 3 (configurable via `PERSISTANCE_SAVE_RETRIES`)
- **Delay:** 50ms between retries (configurable via `PERSISTANCE_SAVE_RETRY_DELAY`)
- **Handles:** Both `false` returns and thrown exceptions

This prevents data loss from temporary file locks, database contention, or network issues.

## Related Documentation

- [Package Build Process](02_1_PACKAGE_build-process.md) - Uses chunking for scanning and archive creation
- [Storage Providers](04_1_STORAGE_providers.md) - Uses chunking for large file uploads
