Epidemic and Cascading Survivability of Complex Networks

Our society nowadays is governed by complex networks, examples being the power grids, telecommunication networks, biological networks, and social networks. It has become of paramount importance to understand and characterize the dynamic events (e.g. failures) that might happen in these complex networks. For this reason, in this paper, we propose two measures to evaluate the vulnerability of complex networks in two different dynamic multiple failure scenarios: epidemic-like and cascading failures. Firstly, we present \emph{epidemic survivability} ($ES$), a new network measure that describes the vulnerability of each node of a network under a specific epidemic intensity. Secondly, we propose \emph{cascading survivability} ($CS$), which characterizes how potentially injurious a node is according to a cascading failure scenario. Then, we show that by using the distribution of values obtained from $ES$ and $CS$ it is possible to describe the vulnerability of a given network. We consider a set of 17 different complex networks to illustrate the suitability of our proposals. Lastly, results reveal that distinct types of complex networks might react differently under the same multiple failure scenario.

computing, e-Health, the Internet of the Things, MANETs, etc. Consequently, the period of time for which a user can operate terminals without network connectivity is becoming very short; and if a large-scale failure occurred, it would impact a significant percentage of the world's population. Another example is the online social networks such as Twitter or Facebook. In August 2013 a single tweet of a billionaire investor made Apple shares rise over $500 [2], showing how a single message can spread and reach millions of users within hours. These two examples depict how important it is to understand the events that might occur on complex networks. From now on, in this work we are going to use the term failure to refer to any event that causes disruption in the normal functioning of a complex network.
Many different protection and restoration techniques for single failures have been extensively analyzed in recent decades (e.g. see [3]). Furthermore, multiple failures such as natural disasters or physical attacks have also been studied [4]. According to the taxonomy introduced in [5], there are two types of multiple failures. While static multiple failures are essentially one-off failures that affect one or more elements (nodes or links) simultaneously at any given point, dynamic failures have a temporal dimension. In this paper we consider dynamic multiple failures, which we implement through epidemic and cascading failures. On one hand, an epidemic-like failure propagation occurs when, at a given time, a node or a group of them start spreading an infection. In this case the failure (e.g. infection) propagates through physical neighbors. On the other hand, cascading failures occur when a node (or a group of them) fails, and as a consequence, other parts in the network fail as well due to an overloading of the capacity. Cascading failures do not necessarily propagate through physical contact, i.e. one node failure can cause a failure to a non-adjacent node due to the network load balancing.
In contrast with single failures, in the case of multiple failures it is nonviable to define proper reactive strategies.
Thus, since the reasonable approach to address such large-scale failures involves the designing phase of a network, it has become of paramount importance to define new metrics able to evaluate the vulnerability of networks in the case of multiple failure scenarios. Appropriate metrics can help network engineers and operators to detect the most critical parts of a network. Although a new generic metric suitable to accurately evaluate the robustness in static multiple failure scenarios has been recently presented in [5], to the best of our knowledge there are no metrics able to evaluate the robustness under dynamic multiple failure scenarios.
In our previous work [6] we presented a metric called epidemic survivability. In this paper we go one step further and we extend the work by considering broader type of failures: dynamic multiple failures. In addition, we extend the number of networks considered for testing of the failure scenarios to 17, as compared to 6 in the previous work.
Consequently, here we consider 2 telecommunication networks, 2 Internet Autonomous Systems (AS) networks, 5 synthetic generated networks, 1 biological network, 3 social networks and 4 power grid networks. Our aim is to take into account a wide range of different types of complex networks, and evaluate them under dynamic multiple failure scenarios. Within this context, the main contributions of this paper are: 1) a new network measure called epidemic survivability (ES). This feature describes the vulnerability of each node of a network under a specific epidemic scenario.
2) a new network measure called cascading survivability (CS), which characterizes how potentially injurious a node is according to a specific cascading failure scenario.
We believe that our proposals can be used by the network research community to evaluate the criticality of nodes of a network under failure propagation scenarios. In addition, our metrics can be used to amplify general recovery metrics such as [7].
The remainder of this work is organized as follows: Section II presents the set of network topologies considered in this paper. In Section III we (a) introduce the state of the art related with epidemic failures; (b) review the most well-known epidemic models; (c) present our new network measure called epidemic survivability; and (d) show a practical example of how could our proposal be used. Then, Section IV (a) provides a background with respect to cascading failures; (b) presents several remarkable cascading failure models; (c) defines our new metric called cascading survivability; and (d) illustrates how to use the metric. Finally, Section V concludes this work reviewing its main contributions and findings.

II. NETWORK TOPOLOGIES
In this section we present the set of seventeen network topologies considered in our work. These networks have been chosen in order to represent a wide variety of complex network topology types. Generating representative synthetic topologies is a difficult task (and it is not the objective of this paper). Thus, we have conducted an extensive investigation and we have obtained seventeen networks from several sources, which are described next (the name of each network includes the number of nodes): 1) abilene93 (Fig. 1a): a small network that has been chosen because of its underlying AS topology structure.
2) cogentco197 (Fig. 1b): a real telecommunications network that has been taken from the repository provided in [8].
12) fb4039 (Fig. 2d): this network represents circles or friends list of the popular social network Facebook [14]. 13) wspg4941 (Fig. 2e): a topology of the Western States Power Grid of the United States [15]. 14) pgieee118 and pgieee300 ( Fig. 2f  It is worth noting that some of the networks were not connected, and a post-processing has been done in order to obtain the largest connected component. Table I shows the networks that have been post-processed because they were disconnected. Furthermore, Table II and Table III     highest mean degree of first neighbors ( d ) and maximum degree (k max ), i.e. in AS25357 there is an AS that is connected to other (k max ) 3781 ASes, and some of them have a high node degree as well. A high k max is an indicator of vulnerability, depicting that removal of such a node could seriously damage the network. Networks with high values of the largest eigenvalue of the adjacency matrix (or spectral radius, λ 1 ) and algebraic connectivity (µ N −1 ) are more robust. In this case, the fb4039 network shows the highest spectral radius and the er400 presents the highest algebraic connectivity. For this reason, these two networks are supposed to be most robust than the rest of them in the case of failures.
Regarding the average shortest-path length ( l ) it is shown that two power grid networks (europg1494 and wspg4941) have the higher values and consequently are more vulnerable. This is due to the fact that, traditionally, power grid networks have a tree-like structure. Furthermore, the average node betweenness centrality ( b ) of cost37, cogentco197 and abilene93 shows that these three topologies have an excess of centrality measures for some nodes, indicating the vulnerability of networks under targeted failures. The absence of 3-cycles in the clustering coefficient ( C ) measurements reveal that the homoge400 and cost37 lack two-hop paths to re-route the traffic in case of failure of one of its neighbors. Finally, networks with negative values of assortativity (r) have an excess of radial links, i.e., links connecting nodes of dissimilar degrees. Such a property is typical of technological networks [19]. This initial network analysis of the considered set of topologies reveals that none of the networks can be considered as the most robust for all of the metrics. Besides, the vulnerability of the networks is going to differ depending on the considered type of multiple failures. As a consequence, it is necessary to define new metrics able to characterize how robust a network is in a specific scenario. The following two sections present two new measures to evaluate network vulnerability in the case of epidemic-like and cascading failures.

III. EPIDEMIC-LIKE FAILURES
Throughout the history of mankind there have been many diseases that have spread quickly, becoming an epidemic or even a pandemic. As a result, many epidemic outbreaks have ravaged human civilizations from the Middle Ages until today. For instance, the devastating Influenza epidemic of 1918 (the third greatest plague in history) claimed 21 million lives and affected over half the world's population [20].
Epidemic models are used to model the spreading of events (e.g. failures) in several types of complex networks.
These models have been used in a wide variety of research fields. For instance, in [21] the authors used characteristics of epidemic spreading to model the fire propagation on a forest. In [22], the authors used epidemic models to show that emotional states spread like infectious diseases across social networks. In [23] it was shown that there are certain network structures that facilitate the propagation of new ideas, behaviors or technologies. In the last years, online social networks (OSNs) have also been the focus of study. For instance, in [24] the authors studied how to control virus propagation in OSNs. Finally, although no commercial references (or reports) have been found with respect to the propagation of failures in telecommunication networks, several works have focused on analyzing the consequences of epidemic attacks on the services provided by such networks [25], [26], [27]. Additionally, a framework to eradicate epidemic failure has been recently proposed in [28]. Nonetheless, to the best of our knowledge, no methods to detect the most vulnerable nodes of a complex network in the case of epidemic failures have been proposed. Therefore, a first step would be to define network measures to characterize all nodes under such failure scenarios.

A. Epidemic Models
Epidemic dynamics in complex networks have undergone extensive research [29], [30] [31], [32], [33]. As a consequence, many epidemic models have been proposed and several families are described in the literature (see Chapter 8 in [34], Chapter 17 in [35] and Chapter 14 in [36]). The first family, called Susceptible-Infected (SI) considers individuals as being either susceptible (S) or infected (I). This family assumes that the infected individuals will remain infected forever, and so can be used for worst case propagation (S → I). Another family is the Susceptible-Infected-Susceptible (SIS) group, which considers that a susceptible individual can become infected on contact with another infected individual, then recovers with some probability of becoming susceptible again.
Therefore, individuals will change their state from susceptible to infected, and vice versa, several times (S I).
The Susceptible-Exposed-Infected-Susceptible (SEIS) model is based on the SIS model, and takes into consideration Regarding communication networks, an extension of the SIS model, which is called Susceptible-Infected-Disabled-Susceptible (SIDS), was proposed in [25] in order to overcome the limitations of the SIS model with respect to optical transport networks. The SIDS model (Susceptible Infected→Disabled→Susceptible) is proposed as one of the first models to consider real telecommunication networks features and it relates each state to a functionality of the network devices. In addition, other epidemic models have also been proposed for wireless telecommunication networks [37].
In this paper we propose a new network measure taking into account the SIS model, which is characterized by two probabilities: (a) β, the probability of being infected by an already infected node; and (b) δ, the probability of an infected node to recover and become susceptible again. However, our proposal can be also applied to any other epidemic model and we plan to do so in the future.
Furthermore, according to [33] and from the following equation: where s is the epidemic intensity and λ 1 is the network's largest eigenvalue of the adjacency matrix, which has been typically used to predict network robustness, when s > 1 an epidemic survives and the spread of the infection might never die. Thus, in order to obtain comparable results between networks with respect to our proposal (epidemic survivability), s must be a parameter of our new measure.
In this work we have fixed s = 3 for all networks, in order to obtain comparable results, and we have obtained a specific β value for each network from the equation β = sδ λ .

B. Epidemic Survivability
Here we present our new network measure called epidemic survivability (ES). We define our proposal as the probability for each node of a given network to be eventually infected (i.e., in a large enough amount of time steps), given a specific epidemic intensity (s). This probability of each node asymptotically reaches a stationary state, according to simulations and theoretical models. Epidemic survivability can be described as the proportion of time for which each node of a given network has been infected for a given s, in a large enough period of time, as shown in Eq. 2: ES i (s) = time for which node i has been infected total time i = 1, . . . , N where N is the number of nodes of the network. As a result, ES has a value between 0 and 1 for each node, where higher the value, more vulnerable is the node under the specified epidemic scenario. Formally, from the SIS model, epidemic survivability can be computed with the following equation: where * means at the stationary state and j ∼ i is the set of neighbors of node i. Here, it is assumed that δ and s are given as parameters and β is obtained from the equation β = sδ λ1 . Thus, it can be observed that Eq. 3 is a recursive formula and must be initialized with a value. We define this initialization of the probabilities in Eq. 4: which corresponds to the solution of Eq. 3 for the case of a homogeneous/regular network. Moreover, a procedure for computing epidemic survivability is provided in Algorithm 1. As it can be observed, the method requires five parameters: the network G and four constants (s, δ, k and tol). The first two steps (lines 3 and 4) compute the largest eigenvalue of the given network and thus obtain the β value of the epidemic model. Then, all probabilities are initialized as stated in Eq. 4 (lines 5 to 7). Therefore, in the main loop of line 8 the new probability of each node is computed as defined in Eq. 3 (lines 9 to 11). After that, the absolute error is checked (lines 12 to 14) and if it results lower than the given tolerance (tol) then the algorithm ends, and returns the array containing the epidemic survivability of each node of the network. If the absolute error is still higher than tol another iteration is performed.

C. The distribution
When computing the epidemic survivability for the nodes of a network, according to a specified set of parameters, it is interesting to analyze the distribution of ES values. If these values are sorted, for example, in descending order, it facilitates the comparison between network topologies when considering the same failure propagation scenario for all of them. This approach is illustrated in Fig. 3 which displays the epidemic survivability distribution, the ES of each node, for the 17 networks in a specific epidemic scenario. As can be observed, the two AS networks (AS25357 and AS26475) together with the two collaboration networks (col4158 and col8638) show the lowest ES distributions, demonstrating that such networks are more robust than the rest of networks, in the case of an epidemic-like failure with epidemic intensity s = 3.  abilene93  AS25357  AS26475  bo1458  bt400  col4158  col8638  cogentco197  cost37  er400  europg1494 fb4039 homoge400 wspg4941 pgieee118 pgieee300 powerlaw400 It is worth noting that different types of complex networks show different ES distribution curves. While AS and collaboration networks show power-law-like curves, power grids, telecommunication networks, synthetic networks and the biological network depict more smooth-decreasing curves. On one hand, curves showing a rapid decrease (i.e. power-law-like profile) would be expected in complex networks regarding critical infrastructures. This is due to the fact that only a small portion of the nodes of the network would be highly vulnerable, and consequently, it would require less effort (e.g. economical) from the network engineer or operator to protect it. On the other hand, regarding social networks one could expect different curve profiles, depending on the purpose of the social network (e.g. a country's government interested in controlling its social networks would prefer flatter curves, because there would not be any node with a high spreading potential). America [40] and Europe [41]. However, cascading failures are not limited to power grids, but any load/capacity related complex network. For example, the authors of [42] stated that two types of cascading failures can occur in backbone telecommunication networks. Other works such as [43] and [44] have focused on the IP layer and optical layer of communication networks, respectively. Moreover, cascading failures have been also studied in socio-technological networks [45]. Other examples of cascading failures include biological, electronic and financial networks.

IV. CASCADING FAILURES
Although the authors of [46] proposed a robustness metric for power grid networks in the case of targeted attacks, to the best of our knowledge, there is not any metric which can be generally applied to any kind of cascading failure or complex network. Therefore, with the purpose of providing the network scientific community with such a measure, in this section we define cascading survivability.

A. Cascading Failure Models
Cascading failures have been extensively studied in the literature. Some of the most well-known models are presented next. In [47] one of the first cascading failure models was presented, which focused on random complex networks. Contemporarily, the authors of [48] presented a simple but functional model. Later on, the model was enhanced in [49] by keeping an auxiliary cost matrix related with the efficiency metric [50], [51]. Furthermore, in [52] the authors proposed an analytically tractable loading-dependent cascading failure model. In [53] an AC blackout model representing most of the interactions observed in cascading failures was presented. Recently, in [54] a cascading failure model for inter-domain routing systems was presented. Moreover, the authors proposed two metrics to assess the impact of a cascading failure: the proportion of failure nodes and the proportion of failed links.
As previously stated in this work, our objective is to define a metric able to characterize the vulnerability of the elements of a network (i.e. in this case nodes) under cascading failures. To do so, we have chosen the model presented in [48]. According to this model, each node j is related with a load L j . The load at each node is the node betweenness centrality, i.e. the number of shortest paths passing through the node. Then, the capacity can be defined as a proportional value to the initial load L j , as denoted by Eq. 5: where N is the number of nodes of the network and α, the tolerance parameter of the model, is a constant that must be α ≥ 0. This parameter is related with the concept of capacity dimensioning of a network, which is of paramount importance at the designing phase of a network (e.g. a critical infrastructure such as a power grid). An appropriate level of over-dimensioning can prevent a network from cascading failures. However, a higher α typically involves a higher economical budget. Therefore, network engineers must seek a trade-off between these two factors.
As defined by the model in [48], we focus on cascades triggered by the removal of a single node. This event, in general, causes changes in the distribution of shortest paths. As a result, after an initial node failure, the new load of the nodes (L j ) might be different from the initial load (L j ). Then, for each node, if the expression of Eq. 6 is satisfied: the node j overloads and fails, which might cause subsequent overloading failures on the rest of nodes of the network.
Finally, we note that in the results presented further in this section we have assumed an α = 0.05 for all networks, with the purpose of allowing comparison among them.

B. Cascading Survivability
Our new network measure called cascading survivability (CS) is presented below. Cascading survivability evaluates how potentially injurious a node is according to a specific cascading failure scenario. In other words, CS can be described as shown in Eq. 7: CS i (α) = the number of nodes that fail if node i initially fails all nodes in the network − 1 1 = 1, . . . , N where N is the number of nodes of the network. As observed, α is a parameter of CS, what means that for different α distinct CS values might be obtained. Cascading survivability takes values in the range between 0 and 1 for each node, where higher the value, more harmful is the node under a specific cascading failure scenario.
We have defined a procedure to compute the cascading survivability of the nodes of a network, which is presented in Algorithm 2. As shown, the method requires two parameters: the network G and the tolerance parameter α. First of all, the initial load and capacity of each node is computed (lines 4 to 7). Then, an initial failure is caused, for each one of the nodes of the given network, one at a time (lines 8 to 20). For each initial failure (line 9) and as well as at each step of the spreading of the cascade, (lines 10 to 19), the new load of the remaining nodes of the network is computed (line 13). If the new load becomes higher than the capacity at any step, then the cascading survivability of the node that initially triggered the failure is increased (lines 14 to 17). Finally, the CS of each node is normalized (lines 21 to 23).

C. The distribution
When computing the cascading survivability for the nodes of a network, given a network and a specific α, it is worth noting the utility of analysing the distribution of the CS values, as previously illustrated for epidemic survivability in Section III-C.
By sorting the CS values in descending order it is possible to compare different networks, according to a specific cascading failure scenario denoted by α. Fig. 4 shows the CS distribution of 15 of the networks considered in this work, in the case of a cascading failure with α = 0.05. It is interesting to note that most of the networks show a bimodal CS distribution. This means that the nodes of such networks can be clearly divided in two groups: harmful and not significant in the case of a cascading failure. This behavior has been observed in other works such as [55].  Firstly, we have proposed a new network measure called epidemic survivability (ES), which describes the vulnerability of each node of a network under a specific epidemic-like failure propagation scenario. Besides, a procedure to compute our novel measure has been provided. Sorting the ES distribution of values of all nodes of a network in descending order, it is possible to analyze which nodes would be more vulnerable in the case of an epidemic failure. Furthermore, using this ES distribution, network vulnerability can be compared for a specific epidemic scenario.
Secondly, we have presented a new network measure called cascading survivability (CS), which characterizes how potentially dangerous a node is according to a specific cascading failure scenario. In addition, we have provided a procedure to compute CS. Then, as for the epidemic survivability metric, we have noted the inherent usability related to the CS distribution.
Lastly, we have computed ES and CS for the set of networks considered in this work, being each measure dependent on a specific failure scenario. Results have shown that distinct types of complex networks might react differently under the same dynamic multiple failure. In addition, results have revealed that a complex network might be more or less vulnerable, depending on the specific type of multiple failure scenario (i.e. epidemic-like or cascading failures). For instance, while the cogentco197 network shows a smooth decreasing curve of ES, the same network shows a bimodal distribution of CS, where about 25% of nodes are not dangerous in the case of cascading failures.
To conclude, the methodology that we have followed to evaluate the vulnerability of the nodes of a network in the case of dynamic multiple failures might be used in further investigations, considering other types of failures or models. This methodology is defined below: 1) Define the set of networks to be analysed.
2) Determine the failure scenario.