
http://eprints.eemcs.utwente.nl/22286/01/imc140-drago.pdf
本文觀(guān)點(diǎn)基于以上paper
相信不是所有同學(xué)都了解Dropbox,先做一個(gè)簡(jiǎn)單知識普及,Dropbox是一個(gè)提供同步本地文件的網(wǎng)絡(luò )存儲在線(xiàn)應用。支持在多臺電腦多種操作中自動(dòng)同步。并可當作大容量的網(wǎng)絡(luò )硬盤(pán)使用。
在展開(kāi)之前先回答一個(gè)問(wèn)題,我們?yōu)槭裁匆P(guān)系Dropbox?隨著(zhù)云計算框架越來(lái)越多的進(jìn)入開(kāi)發(fā)者和用戶(hù)的事業(yè),對文件、數據同步傳輸的要求也越來(lái)越多,越來(lái)越高。我們有必要對行業(yè)內比較流行的數據同步協(xié)議進(jìn)行分析和借鑒。

由于Dropbox不是公開(kāi)協(xié)議,論文中采用了一個(gè)SSL攔截的方式對其進(jìn)行了分析。下面對幾個(gè)比較重要的知識點(diǎn)逐一記錄。
距離對Dropbox性能有顯著(zhù)的影響
We highlight that Dropbox performance is mainly driven by the distance between clients and storage data-centers.
另外短數據傳輸加上一個(gè)perchunk確認機制,非常影響吞吐
In addition, short data transfer sizes coupled with a perchunk acknowledgment mechanism impair transfer throughput, which is as little as 530kbits/s on average.
怎么分析STL/SSL傳輸
a Linux PC running the Dropbox client was instructed to use a Squid proxy server under our control. On the latter, the module SSL-bump4 was used to terminate SSL connections and save decrypted traffic flows. The memory area where the Dropbox application stores trusted certificate authorities
was modified at run-time to replace the original Dropbox Inc. certificate by the self-signed one signing the proxy server.
每一個(gè)上傳trunk都有一確認消息
Each chunk store operation is acknowledged by one OK message.
Dropbox有三種控制協(xié)議
(i) Notification,(ii) meta-data administration, and (iii) system-log servers.
Notification Protocol
TCP長(cháng)連到notifyX.dropbox.com,notification connection沒(méi)有加密。在這個(gè)長(cháng)連的TCP上執行HTTP Comet,即Long-Polling操作。
Meta-data Information Protocol
一個(gè)典型的同步過(guò)程從發(fā)送meta消息到meta數據服務(wù)器開(kāi)始,后跟一批通過(guò)Amazon服務(wù)器進(jìn)行的store或retrieve操作。隨著(zhù)數據塊被成功交換,客戶(hù)端發(fā)送消息到meta數據服務(wù)器來(lái)完成的交易。
同步協(xié)議容易造成小包的傳輸
(i) the synchronization protocol sending and receiving file deltas as soon as they are detected; (ii) the
primary use of Dropbox for synchronization of small files constantly changed, instead of periodic (large) backups.
通過(guò)分析發(fā)現TCP慢啟動(dòng)和確認對性能影響最大
Moreover, flows achieve lower throughput as the number of chunks increases. TCP start-up times and application-layer sequential acknowledgments are two major factors limiting the throughput, affecting flows with a small amount of data and flows with a large number of chunks, respectively. In both cases, the high RTT between clients and data-centers amplifies the effects.
Flows carrying a small amount of data are limited by TCP slow start-up times.
Flows with more than 1 chunk have the sequential acknowledgment scheme (Fig. 1) as a bottleneck, because the mechanism forces clients to wait one RTT (plus the server
reaction time) between two storage operations.
Flows with more than 50 chunks, for instance, always last for more than 30s, regardless of their sizes. Considering the RTT in Campus 2, up to one third of that (5-10s)
is wasted while application-layer acknowledgments are transiting the network.
最終給出了作者們的建議,如何來(lái)優(yōu)化Dropbox的傳輸
即:
1. 設置最小數據塊限制,減少大量小塊數據同步
2. 使用延遲確認,用pipeline方式減少順序確認帶來(lái)的網(wǎng)絡(luò )空閑 Using delayed ack, pipelining chunks to remove the effects of sequential acknowledgments;
3. 存儲靠近用戶(hù),減少傳輸延遲
Our measurements clearly indicate that the applicationlayer protocol in combination with large RTT penalizes the system performance. We identify three possible solutions to remove the identified bottlenecks:
1. Bundling smaller chunks, increasing the amount of data sent per storage operation. Dropbox announced in April 2012, implements a bundling mechanism, which is analyzed in the following;
2. Using a delayed acknowledgment scheme in storage operations, pipelining chunks to remove the effects of sequential acknowledgments;
3. Bringing storage servers closer to customers, thus improving the overall throughput.
聯(lián)系客服