Capsule - Privacy-Enhanced Data Units
Help build the future of decentralized storage! The DIG Network is an open-source project that needs community support to continue development.
💜 Support the Project → - Donate crypto, buy NFTs, or sponsor development
Overview​
A capsule is a privacy-enhanced data unit that transforms arbitrary buffers into fixed-size chunks. By standardizing data into specific size buckets (1MB, 10MB, 100MB, 1000MB), capsules obfuscate content characteristics and enable DIG Nodes to operate as pure network infrastructure without knowledge of the data they store.
Technical Architecture​
Fixed Size Buckets​
Capsules enforce five standardized sizes to maximize privacy:
256 KB = 262,144 bytes
1 MB = 1,048,576 bytes
10 MB = 10,485,760 bytes
100 MB = 104,857,600 bytes
1000 MB = 1,048,576,000 bytes
Transformation Algorithm​
The capsule transform applies intelligent chunking and padding:
function transformToCapsule(buffer: Buffer): CapsuleSet {
const size = buffer.length;
if (size <= 256 * KB) {
return padToSize(buffer, 256 * KB);
} else if (size <= 1 * MB) {
return splitIntoChunks(buffer, 256 * KB);
} else if (size <= 10 * MB) {
return splitIntoChunks(buffer, 1 * MB);
} else if (size <= 100 * MB) {
return splitIntoChunks(buffer, 10 * MB);
} else if (size <= 1000 * MB) {
return splitIntoChunks(buffer, 100 * MB);
} else {
return splitIntoChunks(buffer, 1000 * MB);
}
}
Padding Protocol​
For buffers requiring padding, a removable signal is embedded:
[Original Data][Padding Marker][Random Padding][Size Footer]
↑ ↑
0xFF 0xFF 0xFF 0xFF 4 bytes (original size)
Padding Structure:
- Original Data: Encrypted using the data store ID as the encryption key
- Padding Marker: 4-byte sequence
0xFFFFFFFF
- Random Padding: Cryptographically random bytes (minimum 5% of capsule size)
- Generated using current block height as entropy seed
- Ensures unique padding for each block
- Adds temporal entropy to padding generation
- Size Footer: 4-byte little-endian original size
Padding Requirements:
- Minimum padding must be at least 5% of the capsule size
- Additional padding may be added as needed
- Original data is always encrypted using the data store ID as the encryption key
- Padding ensures uniform capsule sizes while maintaining privacy
- Block height is used as entropy seed for padding generation
Chunking Logic​
When splitting buffers into multiple capsules:
- Primary Chunks: Use the largest applicable size
- Remainder Handling: Falls back to smaller increments
- Last Chunk Padding: Always pads to nearest valid size
Example Transformations:
150 KB → 256 KB capsule (padded)
500 KB → 2 × 256 KB capsules
750 KB → 3 × 256 KB capsules
5 MB → 5 × 1 MB capsules
25 MB → 2 × 10 MB + 5 × 1 MB capsules
350 MB → 3 × 100 MB + 5 × 10 MB capsules
2.5 GB → 2 × 1000 MB + 5 × 100 MB capsules
CapsuleSet Structure​
The output of capsule transformation is a CapsuleSet:
interface CapsuleSet {
id: string; // SHA-256 of original buffer
capsules: Capsule[]; // Array of fixed-size capsules
metadata: {
originalSize: number; // Original buffer size
capsuleCount: number; // Number of capsules
capsuleSizes: number[]; // Size of each capsule
checksum: string; // SHA-256 of concatenated capsules
};
}
interface Capsule {
index: number; // Position in set (0-based)
size: number; // Fixed size (256KB/1/10/100/1000 MB)
hash: string; // SHA-256 of capsule content
data: Buffer; // Actual capsule data
}
Privacy Properties​
Size Obfuscation​
By quantizing to fixed sizes, capsules prevent:
- Content Type Detection: File sizes don't reveal format
- Usage Pattern Analysis: Similar-sized content appears identical
- Traffic Analysis: Network flows show standardized sizes
Statistical Privacy​
Fixed-size capsules create plausible deniability:
- 256KB capsule could be 1KB-256KB of actual data
- Multiple capsules don't reveal relationships
- Padding uses cryptographic randomness
Image File Privacy​
The 256KB tier specifically enhances privacy for common image formats:
- Small images (50-200KB): Padded to 256KB, hiding exact size
- Medium images (300-800KB): Split into 2-4 × 256KB chunks
- Large photos (1-5MB): Fragmented across multiple capsules
- High-res images: Broken into non-identifying fragments
This prevents image analysis attacks and content type detection based on file sizes.
Implementation​
Capsule Creation​
# Constants
KB = 1024
MB = 1024 * KB
def create_capsule(buffer: bytes, target_size: int) -> bytes:
if len(buffer) == target_size:
return buffer
if len(buffer) < target_size:
# Add padding
padding_size = target_size - len(buffer) - 8
padding_marker = b'\xFF\xFF\xFF\xFF'
random_padding = os.urandom(padding_size)
size_footer = struct.pack('<I', len(buffer))
return buffer + padding_marker + random_padding + size_footer
# Should not reach here - chunking handles larger buffers
raise ValueError("Buffer exceeds target size")
Capsule Extraction​
def extract_from_capsule(capsule: bytes) -> bytes:
# Check for padding marker
marker_pos = capsule.rfind(b'\xFF\xFF\xFF\xFF')
if marker_pos == -1:
return capsule # No padding
# Read original size from footer
size_footer = capsule[-4:]
original_size = struct.unpack('<I', size_footer)[0]
# Extract original data
return capsule[:original_size]
Performance Characteristics​
Storage Overhead​
Original Size | Capsule Size | Minimum Overhead |
---|---|---|
128 KB | 256 KB | 100% |
200 KB | 256 KB | 28% |
500 KB | 2 × 256 KB | 5% |
900 KB | 4 × 256 KB | 13.8% |
9 MB | 9 × 1 MB | 5% |
95 MB | 95 × 1 MB | 5% |
Note: All overhead values shown include the minimum 5% padding requirement. Additional padding may be added as needed.
Processing Performance​
- Padding: O(1) - Only adds bytes
- Chunking: O(n) - Linear in data size
- Extraction: O(1) - Direct slice operation
- Verification: O(1) - Check padding marker
Use Cases​
Private Content Storage​
- Medical images broken into unidentifiable fragments
- Photos appear as generic data chunks
- Financial documents indistinguishable from media files
- Personal files blend with public content
Network Infrastructure​
- DIG Nodes operate without content knowledge
- Simplified caching strategies
- Uniform resource allocation
Censorship Resistance​
- Content type obfuscation
- Size-based filtering defeated
- Plausible deniability for operators
Integration with Plots​
CapsuleSets are the primary input for plot creation:
# Transform data into capsules
capsule_set = transform_to_capsules(user_data)
# Create plot from capsule set
plot = create_plot(
capsule_set=capsule_set,
public_key=owner_key,
difficulty=target_difficulty
)
Security Considerations​
Cryptographic Randomness​
- Padding must use secure random sources
- Prevents padding analysis attacks
- Maintains privacy guarantees
- Minimum 5% padding ensures consistent privacy properties
- Block height entropy seed ensures temporal uniqueness of padding
Data Encryption​
- All original data is encrypted using the data store ID as the encryption key
- Encryption occurs before padding is applied
- Ensures data confidentiality even if padding is removed
Size Side Channels​
- Fixed sizes prevent exact size leakage
- Multiple valid representations possible
- Timing attacks mitigated by constant operations
Related Documentation​
- Plots - How capsules are stored permanently
- Cart Format - Transporting capsules between nodes
- Privacy Model - Complete privacy architecture
- Plot Format - Binary format details