Sajjad Iranmanesh,
Department of Software and Information Technology Engineering
18/12/2024
https://syncify.ca
1. Introduction
2. Technical Background
3. System Architecture
4. Implementation
5. Evaluation
6. Conclusion
Key Requirements:
Motivating Scenario: Edge Robotics:
Edge Computing & IoT: A Rapidly Growing Paradigm
Network Challenges:
Data Management Issues:
Device Constraints:
Key Building Blocks in Distributed Storage:
Main Categories of Current Solutions:
Why is Syncing So Hard?
Unreliable Ordering
Eventual Consistency
Why We're Using HLC:
1. Consistency in distributed systems
2. Efficient conflict resolution
3. User-friendly versioning (e.g., 01-01-2024)
Lamport Clocks:
• Logical clock for ordering events
• Scalar value increases monotonically
• Establishes partial ordering of operations
• Preserves causal relationships
Limitations:
• Abstract integers (1, 2, 3...) as versions
• No correlation to real date-time
Hybrid Logical Clocks (HLC):
• Combines physical and logical time
• Preserves causality with real-time correlation
• Versions correlate to physical time
class HybridLogicalClock(
val timestamp: Timestamp = Timestamp.now(Clock.System),
val counter: Int = 0,
val node: NodeID = NodeID.mint(),
)001728518867187:00001:b7e8083dc5b2d32cDefinition: Data structures for replication across multiple computers
Key properties:
Use cases: Collaborative software, shared documents, databases
Advantages:
What problems does CRDT solve?
Key Properties of CRDTs: Commutative, Associative, and Idempotent
Commutative
Definition: The order of operations doesn't affect the final result
Example: a + b = b + a
Associative
Definition: Grouping of operations doesn't affect the final result
Example: (a + b) + c = a + (b + c)
Idempotent
Definition: Applying an operation multiple times has the same effect as applying it once
Example: max(max(a, b), b) = max(a, b)
Eventual Consistency (EC):
Three key properties:
1. Eventual Delivery
Updates reach all correct replicas
2. Convergence
Same updates lead to same final state
3. Termination
All methods complete in finite time
May require conflict resolution
Strong Eventual Consistency (SEC):
{
x: 300,
timestamp: "001728520334648:00000:a7008ecf53def814"
}{
z: 114,
timestamp: "001728520334642:00002:a89653d0ab9fa9fc"
}{
x: 8,
timestamp: "001728520334641:00001:a89653d0ab9fa9fc"
}{
y: 73,
timestamp: "001728520334640:00000:a89653d0ab9fa9fc"
}{ }LWW-Map
{
y: 73,
timestamp: "001728520334640:00000:a89653d0ab9fa9fc"
}{
x: 8,
timestamp: "001728520334641:00001:a89653d0ab9fa9fc"
}{
z: 114,
timestamp: "001728520334642:00002:a89653d0ab9fa9fc"
}{
x: 300,
timestamp: "001728520334648:00000:a7008ecf53def814"
}{ x: 300 }{ x: 300, y: 73 }{ x: 300, y: 73, z: 114 }{ x: 300, y: 73 }{
x: 300,
timestamp: "001728520334648:00000:a7008ecf53def814"
}{
z: 114,
timestamp: "001728520334642:00002:a89653d0ab9fa9fc"
}{
x: 8,
timestamp: "001728520334641:00001:a89653d0ab9fa9fc"
}{
y: 73,
timestamp: "001728520334640:00000:a89653d0ab9fa9fc"
}{ }LWW-Map
{
y: 73,
timestamp: "001728520334640:00000:a89653d0ab9fa9fc"
}{
x: 8,
timestamp: "001728520334641:00001:a89653d0ab9fa9fc"
}{
z: 114,
timestamp: "001728520334642:00002:a89653d0ab9fa9fc"
}{
x: 300,
timestamp: "001728520334648:00000:a7008ecf53def814"
}{ x: 8 }{ x: 8, y: 73 }{ x: 300, y: 73 }{ x: 300, y: 73, z: 114 }Why Tree CRDTs?
Challenges in Implementing Tree CRDTs
Taken from [1]
[1] https://loro.dev/blog/movable-tree
Key Operation in Movable Tree CRDTs
Our Approach
[1] Kleppmann et al., "A highly-available move operation for replicated trees" (2022)
Benefits and Performance
Conclusion
Concurrent moves of same node
Concurrent moves of same node
Taken from https://madebyevan.com/algos/crdt-mutable-tree-hierarchy/
Your Computer
Syncify
The Web
file://path/go/index.html
http://domain.com/path/to/index.html
mtwirsqawjuoloq2gvtyug2tc3jbf5htm2zeo4rsknfiv3fdp46adomain.com => IP Address
A distributed Hash Table(DHT) provides a 2-column table maintained by multiple peers.
Each row is stored by peers based on similarity between the key and the peer ID we call this distance.
The DHT is to provide:
How it Works:
When content added:
Generated unique CID, stored locally
DHT stores only "provider records" (CID → Provider nodes)
Content Discovery Flow:
Request content by CID
DHT returns provider list
Direct peer-to-peer transfer
Floodsub
Pros:
Cons:
Features:
Taken from [1]
[1] docs.libp2p.io/concepts/pubsub/overview/
GossipSub - Key Features of GossipSub
Taken from [1]
[1] docs.libp2p.io/concepts/pubsub/overview/
Subscribing and Unsubscribing
Taken from [1]
When a node joins
Overtime
[1] docs.libp2p.io/concepts/pubsub/overview/
Demo
Superior Performance
Scalability Advantage
Consistency & Reliability
Architecture Benefits:
✓ DHT-based content addressing minimizes bandwidth usage
✓ Pub/sub mechanism reduces update overhead
✓ CRDT enables parallel operations
Key Findings:
Choose k based on data criticality:
Setup:
Key Findings:
Test Setup:
Key Findings:
Key Findings:
Implications:
Bottom Line: System shows good scalability for typical edge workloads but needs optimization for extreme scenarios (high concurrency + large files + high replication)
Excellent Scalability:
Strong Performance Under Stress
Important Trade-offs Identified
Real-world Implications
Test Parameters:
Key Findings:
Performance Benefits:
Practical Implications:
File chunking makes synchronization dramatically more efficient, especially for large files with small changes - exactly what we need in edge environments
Efficient Scaling:
Network Partition Recovery:
Conclusion:
Future Works:
Thank You for Your Attention!
For any questions or further discussion, feel free to contact me at: