Network Tomography of Binary Network Performance Characteristics

[故障链路][2006][ToIT, IEEE][Nick Duffield] Network Tomography of Binary Network Performance Characteristics


  • I. Introduction

    A. Motivation

    The methods based on multicast and unicast probing require development and administrative costs associated with deploying probing and data collection software, which has motivated the goal of reducing the costs by developing inference methods that can work with readily available end-to-end measurements.

    B. The Need for Correlation Measurements

    Independent measurement of the transmission rates from the root node to each of the leaf nodes is insufficient to determine the link transmission probabilities φ_i uniquely. The φ_i are not statistically identifiable from the data, meaning that different sets of parameters exist that give rise to the same statistical distribution of data.

    C. Measurement and Packet-Level Correlation

    When a given multicast probe is observed at multiple endpoints, the contribution to the packet performance from the common portion of the packets' paths is identical.One approach is to give weight to the stripes of closely spaced unicast packets to confirm the strongest correlation, another is through intruducing more parameters into the link model, then to reduce back by coupling the parameters in order to render the model identifiable.

    D. Inference in the Absence of Packet Level Correlations

    The way, through observing the TCP retransmissions, does not assume or attempt to exploit any packet-level correlations. Packets are only assumed to have the same probability of being lost on given link, independent of the path that they take through. By understanding the structure properties that underpin these methods, we aim to develop a quick and simple estimators for the worst performing links for a range of performance characteristics.

    E. Performance Level Correlation

    By observing the packet streams' loss rate or mean delay, we can calculate a summary statistic that could be mapped down to a binary performance(good/bad), which is separable that makes it easy to infer the distribution of link badness from the multiple loss inference. When packet streams of a class experienced good or bad performance on a link, we say the link is good or bad. We say the path is bad if and only if at least one link on the path is bad.

    F. Contribution and Summary

    Develop a framework outlined and show how it can be used to infer the locations of badly performing links.

    1. Section II defines the notion of separability of performance measures and introduce the notion of weak separability.

    2. Section III describes a static algorithm — the smallest consistent failure set(SCFS) algorithm which uses a single measurement of the good/bad status of each source-to-leaf path to infer the location of bad links in the routing tree.

    3. Section IV derives the false positive rate and the detective rate of the SCFS algorithm to identify bad links under the assumption of strict separability, and show that false positive rate is very small for a likely range of possibilities for a link to be bad.

    4. Section V describes an algorithm which can decrease the amount by which the number of candidate bad links exceed the actual number of bad links.

    5. Section VI extends the analysis of Section IV to the general weakly separable case, in which defines the notion of critical link and derives the bounds of false possitive rate and detective rate of critical link.

    6. Section VII compares the performance of SCFS with the algorithms proposed in other literature, showing that SCFS has noticeably better false positive rate than the other methods.

    7. Section VIII outline how time series of path measurements can be used to infer the possibilities of link badness when these vary according to a stochastic proccess.

    8. Proofs of the theorems are deferred to the Appendix.

    G. Related Work (omit)