Modelling of Reservoir Operations Using Fuzzy Logic and Artificial Neural Networks Technische Universiteit Delft H.M. Coerver M ODELLING OF R ESERVOIR O PERATIONS U SING F UZZY L OGIC AND A RTIFICIAL N EURAL N ETWORKS by H.M. Coerver in partial fulfillment of the requirements for the degree of Master of Science in Civil Engineering at the Delft University of Technology, to be defended publicly on Wednesday April 24, 2015 at 4:00 PM. Thesis committee: Prof. Dr. ir. N. C. van de Giesen, Dr. ir. M. M. Rutten, Dr. ir. L. Iannini, Watermanagement Watermanagement Remote Sensing This thesis is confidential and cannot be made public until December 31, 2015. An electronic version of this thesis is available at http://repository.tudelft.nl/. C ONTENTS 1 Modelling of Reservoir Operations using Fuzzy Logic and Artificial Neural Networks 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Fuzzy Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Adaptive-Network-Based Fuzzy Inference Systems . . . . . . . . . . . . 1.2.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . 1 . 1 . 4 . 4 . 6 . 7 . 8 . 12 . 15 2 Modelling of Reservoir Operations using Remotely Sensed Data 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Determining time-series with Remote Sensing products . 2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Fuzzy Logic and ANFIS. . . . . . . . . . . . . . . . . . 2.2.2 Settings . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 17 18 20 20 22 22 25 25 27 iii 1 M ODELLING OF R ESERVOIR O PERATIONS USING F UZZY L OGIC AND A RTIFICIAL N EURAL N ETWORKS 1.1. I NTRODUCTION Today, almost 40.000 large reservoirs, containing approximately 6.000 km3 of water and inundating an area of almost 400.000 km2 , can be found on earth [1]. These reservoirs have been constructed because of the many benefits they provide. They facilitate flood control, domestic and industrial water use, hydro-power generation and irrigation. However, often they also cause lasting adverse effects on people living in affected communities, cultural heritage, rivers, watersheds and aquatic ecosystems [2]. In a recent note by the Food and Agriculture Organization [3] an example on how these different aspects compete over water use with each other in the Red River basin (in Northern Vietnam) is given. A series of reservoirs in this catchment regulate flows and supply much of the electricity needed for Vietnam’s modernisation and industrialisation. But the same system also provides local farmers with water for their rice paddies, which is critical to social stability and food security. As water is becoming more scarce, the allocation of water is becoming more difficult and water resource management more important. In order to facilitate water resource managers in their decision making, global hydrological models (GHM) are being developed. PCRaster GLOBal Water Balance (PCR-GLOBWB) is such a model, it is a global, gridbased terrestrial hydrology model, that can be used to study water availability and adaptation strategies [4]. Since large reservoirs around the world contain more than three times as much water as stored in river channels and almost one-sixth of the global annual river discharge, they have a significant impact on river discharges [5]. In order for global models, like PCR-GLOBWB, to function properly, a good understanding of how a reservoir is being managed is required. Managing a reservoir is an imprecise and vague undertaking. Operators always face uncertainties about inflows, evaporation, seepage losses and various water demands to be met. They often base their decisions on experience and on available information, like reservoir storage and the previous periods inflow [6]. It would thus be interesting to develop a method that could link this information with their decisions. A popular method to model decision-making processes is fuzzy logic as introduced by Zadeh [7]. 1.1.1. F UZZY L OGIC Fuzzy logic uses rules which can be of the form “IF x is A AND y is B, THEN z is C”, where x,y,z are linguistic variables (e.g. water level, inflow, release etc.) and A,B,C are linguistic values (e.g. very high, low, very low etc.) to model a process. These rules consist of a premise and consequence part and are believed to be able to capture the reasoning of a human working in a environment with uncertainty and imprecision [8]. Russell and Campbell [6] and Panigrahi and Mujumdar [9] use fuzzy logic to make their complex reservoir optimization models more appealing to operators, who are reluctant to use procedures they do not fully understand. They use optimization techniques, such as deterministic and stochastic dynamic programming, to 1 2 1. M ODELLING OF R ESERVOIR O PERATIONS USING F UZZY L OGIC AND A RTIFICIAL N EURAL N ETWORKS acquire optimal release decisions with hind-sight. Subsequently they create fuzzy rules which link information available to operators to these optimal release decisions. Shrestha et al. [8] also uses fuzzy logic to model operation rules in order to evaluate system performance. Fuzzy reasoning is the process in which fuzzy rules are used to transform input into output and consists of four steps. (1) Firstly, the input variables are fuzzified, (2) next the firing strength of each rule is determined. (3) Thirdly, the consequence of each rule is resolved and (4) finally the consequences are aggregated. In Figure 1.1 these steps are visualised with an example. As an example a storage of 520 Mm3 and an inflow of 123 Mm3 /month is chosen. In this example, the storage can be either fuzzified trough the membershipfunctions as "low" or "high" and the inflow as "low", "medium" or "high". Note that the shape of the membership functions is triangular here, but many shapes are possible. For the given membership functions, the storage is only classified as "high", the inflow however is both "medium" and "high" (implying that, in practice, some operators would classify this inflow as "medium" and some as "high"). This means two fuzzy rules are relevant for the given input, i.e.: • IF storage is high AND inflow is medium, THEN outflow is Z1 • IF storage is high AND inflow is high, THEN outflow is Z2 The storage has been fuzzified, i.e. it belongs to the membership function "high" and its associated membership value is 0.8. Similarly, the membership values for a "medium" and "high" inflow can be determined. They are 0.6 and 0.4 respectively. Now the firing strength of each rule needs to be determined (i.e. how relevant is each rule). This can be done in many ways, in this example the membership values are multiplied with each other. When looking at the first rule, the "high" storage has a membership value of 0.8, while the "medium" inflow has a membership value of 0.6. The firing strength of this rule is thus W1 = 0.48. In the same manner it follows that the firing strength of the second rule is W2 = 0.32. Many possibilities exist to describe the consequences of rules, in this example they are linear combinations of the input variables as described by Takagi and Sugeno [10], i.e.: Z = p · st or ag e + q · i n f l ow + r (1.1) In which {p, q, r } are parameters to be determined when setting up the fuzzy rule base. Finally, the consequences can be aggregated by using a weighted average: r el ease = W 1· Z1 + W 2· Z2 W1 + W2 (1.2) A big drawback of fuzzy logic is the need to asses a fuzzy rule base. Transforming human knowledge or behaviour into a representative set of rules manually is a complicated task. In the presented example six rules need to be defined (two membership functions for storage and three for inflow give six possible combinations), but as the amount of input variables and membership functions increases, the total number of required rules quickly becomes very large. Jang [11] dealt with this problem by developing a method called Adaptive-Network-based Fuzzy Inference System (ANFIS) to construct a set of fuzzy if-then rules with appropriate membership functions using a Artificial Neural Network (ANN). ANNs are computational models inspired by biological neural networks (e.g. a brain), they are capable of learning and generalising from examples [12]. Jang [11] successfully tested his method on several highly non-linear functions and a used it to predict future values of chaotic time-series. ANFIS only recently found its way into applications for reservoir management. Chang and Chang [13] demonstrates possibilities of using ANFIS to forecast the water-level of Shihmen reservoir in Taiwan during the typhoon season using water-levels from five gauge stations upstream of the considered reservoir to predict water-levels 3 hours ahead. Mousavi et al. [14] uses ANFIS to derive operational rules from releases generated with optimization methods like dynamic programming, much like aforementioned studies on fuzzy logic. The aim of this paper is to derive operational rules from historical time-series of inflow, storage and release using ANFIS, in order to model how reservoirs are currently being operated. 1.1. I NTRODUCTION 3 PREMISE PART Firing Strength Membership value Membership value Fuzzification low 1 high low 1 medium 0.6 200 350 low 1 450 Consequence high 0.8 0 CONSEQUENT PART w1 = 0.8 * 0.6 z1 = p1*x + q1*y + r1 w2 = 0.8 * 0.4 z2 = p2*x + q2*y +r2 0 600 high low 1 medium high 0.8 0.4 0 200 350 450 600 Aggregation 0 INPUT (x) : Storage = 520 Mm^3 OUTPUT: Release = INPUT (y): Inflow = 123 Mm^3 w1*z1 + w2*z2 w1 + w2 Figure 1.1: A example showing the four steps of fuzzy reasoning. Fuzzification X Firing Strength Consequence A1 π N 1 A2 π N 2 Aggregation ∑ N 3 B2 π N 4 3. Normalization 4. Implication 5. Output π 2. Firing Strength Input B1 1. Membership Y Figure 1.2: The five layers of ANFIS for a network with two input variables and two membership functions per variable. 4 1. M ODELLING OF R ESERVOIR O PERATIONS USING F UZZY L OGIC AND A RTIFICIAL N EURAL N ETWORKS 1.2. M ETHODOLOGY 1.2.1. A DAPTIVE -N ETWORK -B ASED F UZZY I NFERENCE S YSTEMS ANFIS is a specific ANN which can deal with linguistic expressions used in fuzzy logic, but at the same time contains the self-learning and self-improving capabilities of ANNs. The network structure is capable of adjusting the shape of the membership functions and of the consequence parameters as seen in equation 1.1 by minimizing the difference between output and provided targets. ANFIS is a feed-forward neural network with five layers as seen in Figure 1.2, note that square nodes contain parameters while circular nodes are fixed. Jang [11] proposes four training methods in his study, one of which is called the Hybrid Learning Rule (HLR). This method combines gradient descent learning and a least squares estimator (LSE) to update the network parameters. It has an advantage over the other methods, because it converges fast and is less likely to become trapped in local minima, which is a common problem when using solely the gradient descent method. The training consists of two passes which are discussed in more detail below. The network has two parameter sets, the premise and the consequence parameters, situated in the "Membership" and "Implication" layer respectively. The consequence parameters are updated in the forward pass with the LSE, while the premise parameters are updated in the backward pass by gradient descent learning. F ORWARD PASS In the forward pass, the output of each layer for a given input is calculated and the consequence parameters are adjusted with the LSE before the final output is generated. Each layer is discussed individually below. 1. The first layer is called the membership layer, the input is put trough a membership function to determine its membership value: O i1 = µ A j (x) (1.3) where Aj is the linguistic label associated with the node, i.e. equation 1.3 is the membership function of Aj , x is the input to the it h node and µ defines the shape of the membership function, here it is chosen as: 1 µ A j (x) = 1+ ·³ ´ ¸b i x−c i 2 ai (1.4) where α = {a i , b i , c i } are the premise parameters. They determine the shape of the membership function as can been seen in Figure 1.3b. 2. The circular nodes in this layer are marked with a Π in Figure 1.2. This layer determines the firing strength for all possible combinations of inputs and their associated membership functions, e.g.: O i2 = w i = µ A j (x) · µB k (y) (1.5) 3. In the third layer, the firing strengths of all the nodes are normalized with respect to each other, i.e.: wi O i3 = w i = Pn i =1 w i (1.6) where n is the total amount of nodes in the 3rd layer. 4. The fourth layer is called the implication layer, The consequences of each rule are multiplied by it’s associated normalized firing strength: ¢ ¡ O i4 = w i · f i = w i · p i x + q i y + r i © (1.7) ª in which p i , q i , r i are the consequence parameters to be updated by the LSE. 5. In the fifth layer all the incoming signals are summed to compute the final output: O5 = n ¡ X i =1 w i · fi ¢ (1.8) 1.2. M ETHODOLOGY 5 L EAST S QUARES E STIMATOR Before the final output is calculated however, the consequence parameters need to be updated. The final output can also be written as: ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ O 5 = w 1 x p 1 + w 1 y q 1 + w 1 r 1 + ... + w n x p n + w n y q n + w n r n (1.9) If P training samples are provided for training to the network, the output for each sample is given by: O 15 w 1 x1 .. .. . . = A·X = w 1 xP O P5 w 1 y1 w 1 yP w1 w1 ··· .. . ··· w n x1 w n xP w n y1 w n yP wn .. . wn p1 q1 r1 .. . pn qn rn (1.10) In which the dimensions of A and X are respectively (P · M ) and (M · 1), with M the total number of consequence parameters. Equation 1.10 needs to be equal to the targets, B, provided by each sample, i.e.: A·X = B (1.11) This is a overdetermined problem which generally does not have a exact solution. Therefore, a least square estimate is sought with sequential formulas [15]: ¡ ¢ X i +1 = X i + Si +1 · a i +1 · b iT+1 − a iT+1 · X i Si ·ai +1 ·aiT+1 ·Si i = 0, 1, · · · , P − 1 (1.12) Si +1 = Si − 1+a iT+1 ·Si ·a i +1 with, X 0 = 0; S0 = γ · I ; γ = positive large number I = identity matrix with dimension (M · M ) a iT = i t h row vector of matrix A; b iT = i t h element of B; So during every forward pass, the consequence parameters (i.e. X) are updated. Note that for one update, only one row of matrix A and only one target value is needed. After the parameters of layer 4 have been updated with equation 1.12, equation 1.8 is used to calculate the output. Finally, the error rate can be calculated with E p = (T p − O p )2 (1.13) in which T p is the target value and O p the output value for the p t h sample. After the error rate has been determined, the forward pass is finished and the error rate is propagated back trough the network in order to update the premise parameters with the gradient descent method. B ACKWARD PASS During the backward pass, the error associated with the sample under consideration is propagated backward trough the network in order to acquire the gradient of the error with respect to each individual premise parameter. So, α is updated according to: ∆α = −η · ∂E p ∂α (1.14) in which η is the learning rate, which is chosen heuristically, and determines the speed of convergence. The derivative is defined as: ∂Ep ∂α = ∂Ep ∂O 5 ∂O 4 ∂O 3 ∂O 2 ∂O 1 ∂O 5 ∂O 4 ∂O 3 ∂O 2 ∂O 1 ∂α (1.15) 6 1. M ODELLING OF R ESERVOIR O PERATIONS USING F UZZY L OGIC AND A RTIFICIAL N EURAL N ETWORKS Table 1.1: Overview of all considered reservoirs, data from Lehner et al. [16] unless otherwise mentioned. Dam Name Country Period Purpose Inflow [m3 /yr]·108 Height [m] Lon., Lat. [DD] Andijan (AJ) Bull Lake (BL) Canyon Ferry (CF) Chardara (CD) Charvak (CV) Kayrakkum (KR) Nurek (NR) Seminoe (SN) Toktogul (TT) Tuyen Quang (TQ) Tyuyamuyun (TM) Uzbekistan United States United States Kazakhstan Uzbekistan Tajikistan Tajikistan United States Kyrgyzstan Vietnam Turkmenistan 2001-2010 2001-2013 2001-2013 2001-2010 2001-2010 2001-2010 2001-2010 1951-2013 2001-2010 2007-2011 2001-2010 Hydropower Multipurposea Multipurposea Irrigation Hydropower Hydropower Irrigation Irrigation Hydropower Hydropower Irrigationb 42.0 2.07 38.1 185 70.6 207 209 12.0 140 97.2 30.7 115 25a 69a 29 168 32 300 90 215 92 - 73.06, 40.77 -109.04, 43.21 -111.73, 46.65 67.96, 41.25 69.97, 41.62 69.82, 40.28 69.35, 38.37 -106.91, 42.16 72.65, 41.68 105.40, 22.36 61.40, 41.21 a U.S. Bureau of Reclamation b Schlüter et al. [17] The first term on the right side of equation 1.15 can easily be determined from equation 1.13: ∂Ep ∂O 5 = −2(T p − O p ) (1.16) The final term of equation 1.15 is threefold, since α = {a i , b i , c i }. These derivatives are derived from equation 1.4 as: 1 ∂O ∂α µ ∂O 1 = ∂a ¶ ³ 2 ´(−1 + b) 2 b (−c + x)2 (−c +2x) a Ã µ ! ³ ´ ¶2 2 b a 3 1 + (−c +2x) a µ³ ∂O 1 ∂b = − µ ∂O 1 ∂c = (−c + x)2 a2 µ 1+ ´b ³ Log h (−c + x)2 a2 (−c + x)2 a2 i¶ ´b ¶ 2 (1.17) ¶ ³ 2 ´(−1 + b) 2 b (−c + x) (−c +2x) a Ã µ ! ³ ´b ¶ 2 (−c + x)2 2 a 1+ 2 a The other terms in equation 1.15 can easily be derived from equations 1.5-1.8. Summarizing, first the input of an sample has been used to activate the network and together with the target of the same sample, the consequence parameters have been updated using a LSE. Next the output error is calculated with 1.13 and propagated backwards trough to network with equation 1.15 after which equation 1.14 is used to adjust the premise parameters. After the backward pass has been completed, a next sample can be used to start again until the error rate converges. 1.2.2. D ATA In order to determine whether ANFIS is capable of deriving a useful fuzzy rule base which captures the characteristics of how a dam is operated or not, 11 reservoirs have been investigated. Table 1.1 lists the considered dams, which are located in the United States, Vietnam and several Central Asian states. The size of the dams is quite varied with dam heights ranging between 25 and 300 meters. The purpose of the reservoirs is diverse as well, several hydro-power, irrigation and two multi-purpose (i.e. irrigation/flood control/hydropower/recreation) reservoirs are considered. As can been seen in Table 1.1, the periods of available data are around 10 years in length for most dams, only Tuyen Quang has a significantly shorter period with available data (i.e. 5 years) and for Seminoe dam in the United States 62 years of data are available. The data of the Central Asian reservoirs has been converted from a three-monthly to a monthly time-scale, while the dataseries of reservoirs in the United States and Vietnam have been converted from daily to monthly data. This has been done in order to allow equal comparison between all reservoirs. 1.2. M ETHODOLOGY 7 Table 1.2: The MSE (10−3 ) [-] for all dams for different time-ranges and with different time-lags. Range Lag AJ BL CF CD CV Dam KR NR SN TT TQ TM 1 2 2 2 0 0 1 2 8.67 3.92 21.7 28.5 35.9 10.4 44.1 80.5 16.9 2.04 13.4 41.2 24.7 3.48 21.0 31.7 9.03 4.53 10.3 16.2 14.1 5.79 24.7 37.9 8.57 1.48 15.8 22.5 13.6 2.67 11.9 18.2 24.4 5.82 15.8 26.7 11.9 5.59 26.1 22.8 2.05 0.692 11.8 21.1 1.0 Membership Value 0.8 slope = -b/2a 0.6 0.4 0.2 0.0 0.0 0.2 0.4 Input (a) 0.6 0.8 1.0 c-a c c+a (b) Figure 1.3: (a) Example showing the initial bell shapes for a variable consisting of two membership functions and (b) a bell shape with a indication of the physical meaning of its parameters. 1.2.3. S ETTINGS When training a network, the first 75% of the dataset is used to train the parameters (i.e. the training-set), while the last 25% is used to validate the solution (i.e. validation-set). During an epoch all samples in a training-set are passed forward and backward trough the network once. The training is stopped when for at least five epochs, the mean square error (MSE) of the simulation with respect to the validation-set has not decreased. Several set-ups of the data samples will be used to train the network. Starting with the simplest one, the input to the network will consist of two parameters, i.e. Storage (S) and Inflow (Q) and the target will be the Release (R), all at the current time (t): © ª I nput = S(t ),Q(t ) , Tar g et = {R(t )} (1.18) This sample type has a time-lag of zero, the output of the network will be the release of a reservoir for the same month as the input provided. The time-range of this sample is equal to one, because the input data uses data at one time-level only. Figure 1.5 shows this and other sample types in a schematic way. A somewhat more complicated data sample is the following: © ª I nput = S(t ), S(t − 1),Q(t ),Q(t − 1) , Tar g et = {R(t )} (1.19) Which has a time-range of two and again a time-lag of zero. With this set-up, the release at time t is determined using the storage and inflow at time t and t-1. Note that since there are now four input parameters, the complexity of the network increases. If for instance, two membership functions are used per input parameter, 8 membership functions are needed in total. With three variables per function (see equation 1.3), the membership layer contains 24 parameters. Furthermore, 24 = 16 different rules can be derived with this input. Since the consequence of every rule contains as much parameters as the length of the input array plus one (see equation 1.7), the implication layer will thus contain 5 · 16 = 80 parameters. 1. M ODELLING OF R ESERVOIR O PERATIONS USING F UZZY L OGIC AND A RTIFICIAL N EURAL N ETWORKS Normalized Storage and Inflow Normalized Outflow 8 1.0 0.8 0.6 0.4 0.2 0.0 0 Data Simulation Training Validation 1.0 0.8 0.6 0.4 0.2 0.0 0 Inflow Storage 20 20 40 60 80 100 120 40 60 Time [months] 80 100 120 Figure 1.4: The plot at the top shows the actual release of Andijan dam used for training and validation and the release as calculated by the trained network. The bottom plot shows the inflow and storage t-2 Q S R t-2 Q S R t-2 Q S R t-2 t-1 t-1 t-1 t-1 t t t t Time-range: one Time-lag: zero (a) Time-range: two Time-lag: zero (b) Time-range: two Time-lag: one (c) Q S R Time-range: one Time-lag: one (d) Figure 1.5: Diagrams showing different sample set-ups. Finally, a third variable will be used, giving another sample set-up. The Time-of-Year (ToY) is expected to be able to account for the seasonality in dam operation. For example, with the following set-up: © ª I nput = S(t − 1),Q(t − 1), ToY (t ) , Tar g et = {R(t )} (1.20) A fuzzy rule could be "IF Time-of-Year is dry AND Storage is low AND Inflow is high THEN Outflow is Z". Also note that this set-up has a time-lag of one, the release is predicted one month ahead. Finally, in order to use back-propagation, initial values for the parameters of the membership layer need to be set. These are set such that for each input parameter the sum of the membership functions equals one, an example for a variable with two membership functions can be seen in Figure 1.3a. 1.3. R ESULTS Figure 1.4 and 1.6 show the results for two representative reservoirs (Andijan and Charvak) for a network with a simple configuration as in equation 1.18. In the top graph, both the measured and simulated release are shown. The left part of the top graph is the training data, these values were provided to the network as targets during training. The right part is used for validation, the MSEs shown in Table 1.2 are based on this data (note that since all used data has been normalised, the MSEs are dimensionless). Finally, the dashed line is calculated using the trained network and the input parameters, inflow and storage, which are shown in the bottom graph. Normalized Storage and Inflow Normalized Outflow 1.3. R ESULTS 9 1.0 0.8 0.6 0.4 0.2 0.0 0 Data Simulation Training Validation 1.0 0.8 0.6 0.4 0.2 0.0 0 Inflow Storage 20 20 40 60 80 100 120 40 60 Time [months] 80 100 120 1.0 1.0 0.8 0.8 Membership Value Membership Value Figure 1.6: The plot at the top shows the actual release of Charvak dam used for training and validation and the release as calculated by the trained network. The bottom plot shows the inflow and storage 0.6 0.4 0.2 0.0 0.0 0.4 0.2 "low" "high" 0.2 0.6 0.4 Input 0.6 0.8 0.0 0.0 1.0 0.2 (a) Inflow (t) 0.4 Input 0.6 0.8 1.0 (b) Storage (t) (c) Convergence 1.0 1.0 0.8 0.8 Membership Value Membership Value Figure 1.7: Results of Andijan Dam. Plots (a) and (b) show the membership functions of the inflow and storage, respectively, after the network has been trained. Plot (c) shows the change of the MSE with respect to the training and validation set. 0.6 0.4 0.6 0.4 0.2 0.2 0.0 0.0 0.0 0.0 0.2 0.4 Input 0.6 (a) Inflow (t) 0.8 1.0 "low" "high" 0.2 0.4 Input 0.6 (b) Storage (t) 0.8 1.0 (c) Convergence Figure 1.8: Results of Charvak Dam. Plots (a) and (b) show the membership functions of the inflow and storage, respectively, after the network has been trained. Plot (c) shows the change of the MSE with respect to the training and validation set. 10 1. M ODELLING OF R ESERVOIR O PERATIONS USING F UZZY L OGIC AND A RTIFICIAL N EURAL N ETWORKS Q3 + 1.5*IQR Q3 IQR Mean Q1 Q1 - 1.5*IQR Outlier Figure 1.9: The consequence parameters of all reservoirs, separated per rule in a boxplot. The parameter ’p’ is multiplied with the inflow, ’q’ with storage after which they are summed with ’r’ to determine the release. When looking at the storage between the 70t h and 100t h month of Andijan, the storage seems to be in a depression, the inflow during these months is also relatively low. Charvak on the other hand, shows a more constant pattern in its inflow and storage. The simulated release follows the test data nicely for both reservoirs with MSEs of 8.67·10−3 and 9.03 ·10−3 for Andijan and Charvak respectively. The peaks match very closely, even during the drier period observed at Andijan. Both the membership functions for inflow and storage and the convergence curve for Andijan are shown in Figure 1.7. The "high" inflow membership function is almost constant at 1, while the low inflow function is nearly unchanged (compare with Figure 1.3). For "low" inflows (i.e. below approximately 0.3), both membership function will give a strong firing strength to their respective rules, meaning that the simulated release during low inflows will always be based upon more than one fuzzy rule. The convergence curves are quite long with over 400 epochs needed to completely converge. A striking thing about this graph is the validation curve actually going under the training curve, i.e. the model performs better on the validation data than on the data it is being trained on. Figure 1.8 shows the membership functions and the convergence curve for Charvak. The membership functions for the inflow are very similar to the initial functions. Horizontally, the division between low and high inflow is still around 0.5, the intersection point has only shifted slightly downwards. The membership functions for the storage show bigger changes however, the intersection point shifted all the way to the top and slightly to the right. For storage higher than 0.6, both membership functions will thus create a relatively large firing strength, meaning the release will be a product of more than one fuzzy rule. The convergence goes very fast for this reservoir, after the first epoch not much seems to change any more. For both reservoirs, the membership functions representing the storage show a clear distinction between storage below and above approximately 0.7. Apparently, the release regime differs so much on either side of this value that two different consequence functions are needed to calculate the release. For the inflow, this distinction is less clear for Andijan than for Charvak. At some reservoirs, the distinction between a low or high inflow or storage is completely taken away during the training. In the case of Chardara for example, the membership function representing the low inflow is zero for the whole input range, while the other membership function is constant at one. This means that the consequences associated with low inflow are never used. Figure 1.9 shows the trained consequence parameters (also see equation 1.7) in a boxplot for all reservoirs. Most of the outliers belong to two reservoirs (Andijan and Bull Lake), and when ignoring those, some patterns are visible in this graph. For example, when looking at the consequence rules for a low inflow and a low storage, the parameter associated with inflow (p) has a order of magnitude between 0 and 2, while q 1.3. R ESULTS 11 MSE = 0.0141 MSE = 0.0027 Test Data Simulation 1.0 1.0 0.8 Normalised Outflow Normalised Outflow 0.8 0.6 0.4 0.2 0.0 0 0.6 0.4 0.2 5 10 15 20 25 Time [months] (a) Bull Lake 30 35 0.0 0 50 100 Time [months] 150 (b) Seminoe Figure 1.10: Graphs showing (a) the effect of adding a Time-of-Year variable to the network and (b) the effect of using a larger time-range. (associated with the storage) is of order 0 to -2. Thus, the release is calculated by subtracting the storage from the inflow and finally adding a independent value ( r ). A similar pattern is seen when inflow is low and storage is high. When inflow and storage are both high the release is calculated by summing the inflow and storage and subtracting a independent value. The MSEs for all reservoirs are shown in Table 1.2. Although Andijan and Charvak are not the best performing simulations, they are definitely on the better side of the spectrum, which ranges between 2.05·10−3 (Tyuyamuyun) and 35.9·10−3 (Bull Lake). The simulations compared to the observed test data for the other nine reservoirs are shown in Figure 1.12. Bull Lake is the worst performing reservoir, when looking at its results, it becomes clear that the network is clearly not capable of dealing with the very low (i.e. near zero) release rates in between peaks. Chardara and Toktogul are next in line, although the peaks for the first mentioned seem to be of the right magnitude, the timing is less fortunate. The results of Toktogul can be explained when taking a closer look at its training and validation data, because the first peak (between 0 and 10 months) occurs during a year in which the storage is extraordinary low compared to all other years. Canyon Ferry, Kayrakkum, Nurek and Tuyen Quang perform satisfactory, i.e. the shape of the simulations clearly follows the actual release. Although Canyon Ferry shows a large overestimation of a peak-flow, the actual peak is very uncommon for Canyon Ferry and a longer time-series might improve its performance on these kind of occurrences. For Tuyen Quang it is important to note once more that the dataset is very short and the validation is done over a 14 month period. Seminoe has the largest dataset tested and shows a similar problem as Bull Lake. The network seems incapable of dealing with the very low flows and the average peak-flows, while the high peaks are simulated quite accurately. The performance could also be hampered by the sheer length of the dataset, a period over which it is not unlikely that the operation regulations might have changed. This would mean the fuzzy rules are trying to describe two different modes of operation. Finally, Tyuyamuyun is the best performing reservoir with a very close fit. This result becomes less impressive though when comparing the release with the inflow, which shows a very strong linear correlation. A DDING VARIABLES As described in the previous section, adding a variable describing the ToY, is expected to increase performance because it could allow a network to deal better with seasonal patterns. When looking at Bull Lake this is definitely the case, 1.10a shows the results with a data-type as described in equation 1.20. The addition of this extra variable has quite a impressive effect on the result, the low flows are now much more accurately simulated and two peak flows show a near perfect fit. The first peak however, now performs worse. The results for Seminoe, which seemed to struggle with similar problems, were less spectacular though since no 1. M ODELLING OF R ESERVOIR O PERATIONS USING F UZZY L OGIC AND A RTIFICIAL N EURAL N ETWORKS 1.0 1.0 0.8 0.8 Normalised Outflow Normalised Outflow 12 0.6 0.4 0.2 0.0 0 Test Data Time-lag = 1 Time-lag = 2 0.6 0.4 0.2 5 10 15 Time [months] (a) Andijan 20 25 0.0 0 5 10 15 Time [months] 20 25 (b) Charvak Figure 1.11: Graphs showing the effect of adding time-lag to the network for (a) Andijan and (b) Charvak. significant change occurred. Figure 1.10b shows simulations done for Seminoe with a network trained with a data-type as described in equation 1.19. Using this data-type greatly improves the results, most peaks match nicely now and also the low flow regime is captured. The simulation now uses 16 rules to describe the operation of the reservoir. Of these rules, only 8 are actually used. The others are never activated. This explains the improvement in performance over the 4-rules-network, there are now 4 extra rules describing more specific situations. Even more so than for the previously discussed Andijan and Charvak reservoirs, the membership functions for this network have great overlap. Meaning that for many samples, more than one rule is activated and the release is thus a product of multiple consequences. For completeness, the results for other reservoirs when using a larger time-range are also shown in Figure 1.12. As can be seen in Table 1.2, the MSEs for all reservoirs greatly improve and range between 0.692·10−3 (for Tyuyamyun) and 10.4·10−3 (Bull Lake). A DDING TIME - LAG Figure 1.11 shows results for Andijan and Charvak again, but this time with a time-range of two and a time-lag of one and two month(s) (and no variable for the ToY). In the case for a time-lag of one, Andijan shows a nice fit with accurate timing and magnitude of peaks. The same goes for Charvak. When the time-lag is further increased, the fits become less accurate as expected. Table 1.2 shows the errors for simulations with time-lag of the other reservoirs. The errors range between 10.3·10−3 (Charvak) and 44.1·10−3 (Bull Lake) for a time-lag of one month and between 16.2·10−3 (Charvak) and 80.5·10−3 (Bull Lake) for a time-lag of two months. 1.4. D ISCUSSION As shown in Figures 1.4, 1.6 and 1.12, ANFIS is capable of deriving fuzzy rules which are able to describe the release regime for most reservoirs. For some of the considered reservoirs however, the four fuzzy rules inherent to a network built for two parameters with two membership functions each (as described in 1.18) are not sufficient. The low and peak flows for these reservoirs are consequently over and underestimated. As shown in Figure 1.10a, adding more input variables like the ToY or inflow and storage at a other time-level can help to simulate the low and peak flows more accurately. For the case of adding a ToY variable, it is easy to see why this could help improve performance. Management of reservoirs often anticipates on the occurrence of dry and wet seasons by applying different modes of operation. The addition of this variable thus allows the fuzzy rules to make a clear distinction between seasons when this distinction is not clearly present in the input parameters. Although for some reservoirs adding the ToY variable helped improve performance, for some it did not or even worsened results. This can 1.4. D ISCUSSION 13 MSE = 0.0169 1.0 1.0 0.8 0.8 0.8 0.6 0.4 0.2 0.0 0 Normalised Outflow 1.0 Normalised Outflow Normalised Outflow MSE = 0.0359 0.6 0.4 0.2 5 10 15 20 25 Time [months] 30 0.0 0 35 5 10 15 20 25 Time [months] 30 0.0 0 35 0.8 0.8 0.4 0.2 Normalised Outflow 0.8 0.6 0.6 0.4 0.2 20 5 10 15 Time [months] (d) Kayrakkum 0.6 0.4 20 0.0 0 25 MSE = 0.0244 MSE = 0.0119 0.8 0.8 Normalised Outflow 0.8 Normalised Outflow 1.0 0.2 0.6 0.4 0.2 5 10 15 Time [months] (g) Toktogul 20 25 0.0 0 150 MSE = 0.0020 1.0 0.4 100 Time [months] (f) Seminoe 1.0 0.0 0 50 (e) Nurek 0.6 25 0.2 0.0 0 25 20 MSE = 0.0136 1.0 10 15 Time [months] 10 15 Time [months] MSE = 0.0086 1.0 5 5 (c) Chardara 1.0 Normalised Outflow Normalised Outflow 0.4 (b) Canyon Ferry MSE = 0.0141 Normalised Outflow 0.6 0.2 (a) Bull Lake 0.0 0 MSE = 0.0247 Test Data Time-range = 1 Time-range = 2 0.6 0.4 0.2 2 4 6 8 Time [months] 10 (h) Tuyen Quang 12 0.0 0 5 10 15 Time [months] 20 25 (i) Tyuyamuyun Figure 1.12: Results for nine reservoirs when simulated with a time-range of either one or two and zero time-lag. be explained by the fact that ANFIS processes each variable the same way, while the ToY would actually need a different approach. The consequence (as described in equation 1.1 and 1.7) for each rule is the sum of all input variables multiplied by their associated parameter. By adding the ToY, the consequence of each rule thus acquires a extra parameter. But it is actually undesired that the ToY has such a direct influence on the release (i.e. adding ’time’ to ’volume’ does not make sense). It would be better to use the ToY solely in the premise part of the fuzzy rules, in order to make a clear distinction between seasons, without letting it interfere directly with the consequence part of a rule. This way, the amount of fuzzy rules would double (in case two membership functions for the ToY are used), while the amount of variables in the consequence part of a rule would remain the same. Using a larger time-range for the input variables improves the performance of the model for every reservoir. For a time-range of two, the release is not only based upon the inflow and storage of the current month, 14 1. M ODELLING OF R ESERVOIR O PERATIONS USING F UZZY L OGIC AND A RTIFICIAL N EURAL N ETWORKS 100 Rule 1 Rule 2 Rule 3 Rule 4 Rule 5 Rule 6 Rule 7 Rule 8 Rule 9 Rule 10 Rule 11 Rule 12 Rule 13 Rule 14 Rule 15 Rule 16 80 % 60 40 20 0 AJ BL CF CD CV KR NR SN TT TQ TM Location Figure 1.13: Each bar shows the share of how much a rule contributes during a full simulation, i.e. every block in a bar represents 1 of 16 rules. but also on the month before. In this case, a total of 16 rules are used to describe the reservoirs behaviour. It turns out though, that a large part of these rules remain unused for many reservoirs, as can be seen in Figure 1.13. For example, the results of Bull Lake make use of four rules and Tuyen Quang uses two dominant rules besides two less important rules. Since for most reservoirs 16 rules are a surplus, it would be interesting to try out network configurations resulting in fuzzy rules bases consisting of less than 16, but more than 4 rules. For example, using 3 membership functions per variable and a time-range of one would create a network with 9 rules with consequences containing 3 parameters. Or only increasing the time-range for inflow while keeping a range of one for storage would result in 8 rules with 4-parameter-consequences. Furthermore there is a big overlap between membership functions, leading to activation of multiple fuzzy rules for a single sample. This is undesirable because it greatly undermines the basic principle of fuzzy logic. Input is translated to linguistic labels and processed by fuzzy rules which represent human behaviour and knowledge. But when samples are processed by multiple rules, the logical interpretation of a network becomes much harder. E.g., if the inflow and storage are each classified as both ’low’ and ’high’, then four rules are used to calculate the release. From a logical point of view, this does not make sense, a inflow cannot be low and high at the same time. Of course, there must be some overlap between membership functions, since the whole range of input values needs to be sufficiently covered. But, for example, the membership functions for inflow below 0.3 at Andijan (see Figure 1.7a) are definitely overdoing it. Nearly all simulations using a time-range of two showed this illogical behaviour. Wismer and Chattergy [18] propose a method called the constrained gradient descent in which some limitations with regards to the bell shaped function (see equation 1.4) are formulated. Considering {a i , b i , c i } and {a i +1 , b i +1 , c i +1 } and setting c i + a i = c i +1 − a i +1 ensures that the sum of two consecutive membership functions never exceeds 1. Figure 1.9 gives a strong indication that the consequences of a fuzzy rule have a very similar set-up for different reservoirs. Especially when considering that nearly all of the outliers belong to only two reservoirs (Andijan and Bull Lake). Apparently, for the remaining 9 reservoirs, the relation between inflow and storage on the one hand and release on the other is similar. Obviously, the amount of reservoirs studied here is small, especially when taking into account the different types of reservoirs studied. More reservoirs would need to be investigated and attention should be paid to their main purpose (i.e. irrigation, hydro-power etc.). Possibly, certain types of reservoirs exhibit specific parameter ratios for a distinct situation (e.g. low storage and low inflow). Defining these ratios would allow the application of fixed consequences to different types of reservoirs. A drawback of applying the proposed method is the need to acquire in-situ time-series, which is often problematic as a result of multilateral mistrust [19]. The last decade, the possibilities of observing reservoirs 1.5. C ONCLUSION 15 from space using altimeters and radar and optical imagery have grown fast and this trend is expected to continue as more satellites are scheduled for launch. Combining the method proposed here with remotely sensed time-series could open possibilities for GHMs, by allowing the derivation of operational rules for large numbers of reservoirs all around the world. 1.5. C ONCLUSION It has been shown that using fuzzy logic and ANFIS, operational rules of existing reservoirs can be derived without much prior knowledge about the reservoir. Their validity was tested by comparing actual and simulated releases with each other. The rules can be incorporated into hydrological models like PCR-GLOBWB (or more regional models) struggling with reservoir outflow forecasting. After a network for a specific reservoir has been trained, the inflow calculated by the hydrological model can be combined with the release and a initial storage to calculate the storage for the next time-step using a mass balance. Subsequently the release can be predicted one time-step ahead using the inflow and storage. Especially when a time-range larger than one is used, outflow can be predicted accurately. As stated before however, the membership functions for a variable sometimes overlap too much, leading to multiple rules with significant firing strengths, which is undesirable because it takes the logic out of fuzzy logic. Finally, there are indications that the consequences of fuzzy rules are applicable across reservoirs, but more research on more reservoirs is needed. 2 M ODELLING OF R ESERVOIR O PERATIONS USING R EMOTELY S ENSED D ATA 2.1. I NTRODUCTION Many reservoirs can be found all around the world, an estimated 40.000. They inundate roughly 400.000 km2 and have a total storage capacity of more than 6.000 km3 [1]. They provide societies with many benefits like flood control, domestic and industrial water use, hydro-power generation and irrigation. On the other hand, they also cause serious negative effects on people living in affected communities, cultural heritage, rivers, watersheds and aquatic ecosystems [2]. As economies grow and societies develop, the allocation of water becomes harder as a result of increased water demand, while the amount of water available remains about the same. A situation which demands action from water resource managers. PCRaster GLOBal Water Balance (PCR-GLOBWB) is a global-scale, grid-based terrestrial hydrology model which facilitates water resource managers in their decision making. It is used to study water availability and adaptation strategies [4]. Since reservoirs are so ubiquitous, they have a significant impact on river discharges. They contain more than three times the amount of water stored in river channels and almost one-sixth of the global annual river discharge [5]. This implies that without a proper understanding of how reservoirs are being managed, global hydrological models (GHM), like PCR-GLOBWB, cannot function properly. Managers of reservoirs always face uncertainties about inflows, evaporation, seepage losses and various water demands to be met, making the management a imprecise and vague undertaking. Dam operators decisions are often based on their experience and the information available to them at the decisive moment, like current storage and last periods inflow [6]. Data driven methods show great potential in predicting these decisions. Chapter 1 showed the possibility of linking information like storage and inflow to the operator’s decision resulting in a specific release by using fuzzy logic [7] and a Adaptive-Network-Based Inference System (ANFIS) [11]. A big drawback of data driven methods however, is the need to acquire extensive amounts of data. Data which is generally hard to come by as a result of multilateral mistrust [19] and, from the writers own experience, distrust between different stakeholders within a state. When considering the amount of reservoirs included in a model like PCR-GLOBWB (i.e. 513 [4]), it becomes clear that it would be a huge undertaking to collect data for all these reservoirs in a traditional way. Therefore it would be interesting to see if the method proposed in Chapter 1 would work when using storage time-series derived from solely remote-sensing (RS) products like altimeters and optical and radar imagery, inflow time-series derived from hydrological models and release time-series derived from a waterbalance. This less conventional way of monitoring reservoirs will introduce uncertainty (compared to monitoring with in-situ data) and will, in addition, result in a irregular sampling in time (as a result of erratic availability of remote-sensing products). This study provides a theoretical sensitivity analysis with regards to the last mentioned implication of using RS products instead of in-situ data for determining time-series needed to model reservoir operations as described in Chapter 1. 17 18 2. M ODELLING OF R ESERVOIR O PERATIONS USING R EMOTELY S ENSED D ATA Figure 2.1: Schematization showing the process of converting different remote sensing products into reservoir storage time-series. 2.1.1. D ETERMINING TIME - SERIES WITH R EMOTE S ENSING PRODUCTS Modelling operational rules of reservoirs using fuzzy logic in combination with ANFIS as described in Chapter 1 requires storage, inflow and release time-series to train the fuzzy rules. Deriving these time-series from RS products requires the combination of multiple products and techniques in order to achieve a sufficient frequency of measurements. The main principle behind deriving reservoir storage using radar or optical imagery is linking reservoir extend (A) to storage (S) trough a reservoir specific A-S-relation. With altimeters, the water level (H) can be measured, which is then linked to storage trough a reservoir specific H-S-relation, this whole process is schematized in Figure 2.1. A-S and H-S relations can be derived using Digital Elevation Models (DEM) created during missions like, for example, the Shuttle Radar Topograhpy Mission (SRTM) [20] [21]. Some studies also show the possibility of directly linking A to H to determine changes in S (∆S) [22]. Inflow can be determined with hydrological models, after which the release can be derived from a water balance. S TORAGE Baup et al. [22] investigated the possibilities to estimate the volume changes of small lakes using satellite images and altimetry by studying a 52 ha sized lake (with a maximum storage of 4.1 hm3 ) in the south-west of France using data from Envisats RA-2 altimeter, radar images from TerraSAR-X and RadarSAT-2 and optical images from FormoSAT-2. The authors found a root-mean-square-error (RMSE) of 0.06 hm3 for ∆S compared with in-situ measurements (and R 2 = 0.98). They see great potential in the monitoring of small reservoirs, but state that at the moment the poor density of altimetry tracks and their low temporal resolutions greatly limits the possibilities. This is expected to change though, with the launch of several new altimeters (Sentinel-3, Jason-CS, GFO2 and SWOT) in the near future. Another factor that restricts the use of their method is the presence of dense vegetation over the free water. This vegetation hinders the derivation of the surface water area from optical images (and to a lesser extend radar images) and also influences the altimetry data. Annor et al. [23] also ascertained this last mentioned problem. In their study, the authors investigate the possibility to delineate small reservoirs from radar images (Envisats ASAR) and conclude that this can be done accurately. In some reservoirs however, the area is underestimated as a result of reed vegetation in the tail-ends of reservoirs. Liebe et al. [24] investigated the delineating of water surface areas from radar images (also from Envisats ASAR) as well and found two more factors influencing the results. Firstly, the contrast between land and water is critical. Their research area was situated in a semiarid region, where during the dry season (when there is few vegetation) the land-water contrast becomes problematic for the delineation. Secondly, the wind sometimes significantly interferes with the radar signal. Eilander et al. [25] recently published a study looking into these problems and proposed a new Bayesian approach for handling Synthetic-Aperture Radar (SAR) images. The developed logarithm is able to delineate surface water even when there is a low land-water contrast. ETM+ ASAR N/A AVNIR-2 SAR SAR OLI SAR MSI Optical Radar Optical Optical Radar Radar Optical Radar Optical Type 30 30 2-8 10 1-18 1-100 30 5-100 10-60 Spatial Res. [m] Instrument RA RA Topex-Poseidon 1 GFO-RA GLAS RA2 Poseidon 2 Poseidon 3 SRAL 2 RA SRAL KaRIN Satellite ERS1 ERS2 Topex-Poseidon GFO ICESat Envisat Jason-1 Jason-2 Jason-CS GFO2 Sentinel-3 SWOT 35 35 10 17 91 35 10 10 27 7 Temp. Res. [days] 16 35 N/A N/A N/A 24 16 12 3-5 1991 1995 1992 1998 2003 2002 2001 2008 2017 2016 2015 2016 Start Temp. Res. [days] Table 2.2: Overview of satellites carrying instruments for optical and radar imagery. Instrument Satellite Landsat 7 Envisat FormoSAT-2 ALOS TerraSAR-X RadarSAT-2 Landsat 8 Sentinel-1 Sentinel-2 Table 2.1: Overview of satellites carrying instruments for optical and radar imagery. ESA ESA NASA US Navy NASA ESA NASA NASA NASA US Navy ESA NASA NASA ESA NSPO JAXA DLR CSA NASA ESA ESA Agency Duan and Bastiaanssen [26] Liebe et al. [24], Annor et al. [23] Baup et al. [22] Baup et al. [22] Eilander et al. [25], Baup et al. [22] - Reference Santos da Silva et al. [27] Duan and Bastiaanssen [26], Phan et al. [28] Santos da Silva et al. [27], Baup et al. [22] - Reference Free Commercial Commercial Commercial Commercial Commercial Free Free Free Availability Agency Active 2012 Active 2011 Active Active Active Active N/A End 2000 2003 2006 2008 2010 2012 2013 Still Active N/A N/A N/A N/A End 1993 2002 2004 2006 2007 2008 2013 2014 2015 Start 2.1. I NTRODUCTION 19 20 2. M ODELLING OF R ESERVOIR O PERATIONS USING R EMOTELY S ENSED D ATA Table 2.3: Overview of all considered reservoirs, data from Lehner et al. [16] and the U.S. Bureau of Reclamation. Dam Name Country Period Purpose Height [m] Inflow [m3 /yr] (108 ) Lon., Lat. [DD] Bull Lake Canyon Ferry Seminoe Tuyen Quang United States United States United States Vietnam 2001-2013 2001-2013 1951-2013 2007-2011 Multipurpose Multipurpose Irrigation Hydropower 25 69 90 92 2.07 38.1 12.0 97.2 -109.04, 43.21 -111.73, 46.65 -106.91, 42.16 105.40, 22.36 Table 2.1 shows a overview of recent, current and future satellites carrying instruments for earth observations by either radar or optical imagery. When looking at the temporal resolutions of the radar instruments, it becomes clear that they can provide a time-scale smaller than a month, which could be refined further by adding measurements from optical images (which are often hampered by cloud cover however). Duan and Bastiaanssen [26] determine storage for three lakes (Mead, Tana and IJssel) with a average capacity of 23 km3 using, besides altimetry data, optical images from Landsats 7 ETM+. They conclude that this product is suitable for determining storage and find values agreeing to observations (with R = 0.95 to 0.99). Additionally, Table 2.2 shows a overview of altimeters. Although there is currently only one active altimeter, several launches are scheduled in the near future. Using their measurements, the time-scale of storage time-series derived from radar images can be decreased even more, as shown by Santos da Silva et al. [27] and Phan et al. [28]. The first mentioned study used ERS2 and Envisats altimeters to determine river stages and found results agreeing with in-situ measurements with a error around 40 cm. Obviously, the width of rivers is usually much smaller than the width of lakes, which makes it harder to distinguish between actual measurements of the water level and measurements of the river shore. Accuracy for lakes is thus expected to be higher. Furthermore, Phan et al. [28] estimated water levels for 154 lakes on the Tibetan plateau using ICESats GLAS altimeter and observed a decimetre accuracy (similar to results by Duan and Bastiaanssen [26]). R ELEASE Besides storage, release time-series are also needed for reservoir operation modelling. This information can be extracted by considering the water balance equation of the reservoir under consideration, as shown by Muala et al. [29]. In their study, the authors show how reservoir release for Roseires Dam (in Sudan) and Aswan High dam/Lake Nasser (in Egypt) can be determined using limited in-situ and mostly satellite altimetry and imagery data.They use the water balance of the reservoir to determine the outflow, neglecting groundwater flows: dS (2.1) Q out = Q i n + A · (P − E ) + dt In which Q out is the release from the reservoir, Q i n is the flow into the reservoir, A is the reservoirs extend, P is the precipitation, E is the open water evaporation and dS/dt is the change in storage. The inflow is obtained from in-situ measurements, storage change is determined using remote sensing data as described in the previous section. Precipitation and evaporation were acquired from the International Water Management Institute (IWMI) On-line Climate Summary Service Model. For the outflow of Roseires dam their research shows that the calculated discharge agreed to in- situ measured discharge with a normalised-root-mean-squareerror (NRMSE) of 18%. For Lake Nasser however, the results were not as good with a NRMSE of 70%. Causes for this high NRMSE are only speculated, there could have been unaccounted outflows or a overestimation of the inflow into the lake. Thus, time-series on storage, release and inflow of reservoirs can be derived with RS products with timescales smaller than monthly. This research investigates if this frequency is sufficient and what the effect of increasing the amount of measurements is on the capability of ANFIS to capture the rules under which a reservoir is being operated. 2.2. M ETHODOLOGY 2.2.1. F UZZY L OGIC AND ANFIS Chapter 1 uses fuzzy logic and ANFIS to find a relation between reservoir inflow and storage on the one hand and release on the other. 2.2. M ETHODOLOGY 21 t-2 Q S R t-1 t Figure 2.2: Diagram showing the set-up of a sample. Fuzzy logic uses rules of the form "IF x is A AND y is B, THEN z is C", where {x,y,z} are linguistic variables (e.g. storage, inflow and release) and {A,B} are linguistic values (e.g. low or high). C is a so called crisp consequence in the form of a linear combination of the input variables [10]. Input variables (like storage and inflow) are first fuzzified (which is, e.g., determining if a variable can be classified as being "low" or "high" and how much so) with membership functions, after which the relevant fuzzy rules are selected and their consequences calculated. Finally, the consequences are aggregated using a weighted average. The tricky part of applying fuzzy logic is the assessment of the fuzzy rules. Determining how to fuzzify a variable and what the consequence is supposed to be for a certain combination of variables and their linguistic values is hard. It requires either expert knowledge or a method to derive these definitions from a set of observations. ANFIS is such a method. It uses the self-learning and generalising aspects of Artificial Neural Networks (ANN) to decide how the input variables can be fuzzified most effectively and what the consequence of each rule should be by minimising the difference between target values and computed output using a hybrid learning rule (HLR). As the name implies, the HLR combines two methods, namely the gradient descent method (GD) [30] and the least squares estimate (LSE). The GD is used to update the parameters that define the membership functions by calculating a derivative of the mean square error (MSE) (of the target and output value) with respect to each trainable parameter. Thus, the parameters are adjusted in the direction for which the error decreases. The parameters describing the linear combination of input variables which make up the consequence of a rule are trained with a LSE. The relation between these parameters (X), the input (A) and the aggregated output (B) can be written in matrix notation as: A·X = B (2.2) This is a overdetermined problem which usually does not have a exact solution. Therefore, a LSE of X is sought to minimize ||A · X − B ||2 in which B is set equal to the target values. The sequential method is employed to calculate the LSE [15]: ¡ ¢ X i +1 = X i + Si +1 · a i +1 · b iT+1 − a iT+1 · X i Si ·ai +1 ·aiT+1 ·Si i = 0, 1, · · · , P − 1 Si +1 = Si − 1+a iT+1 ·Si ·a i +1 (2.3) with, X 0 = 0; S0 = γ · I ; γ = positive large number I = identity matrix a iT = i t h row vector of matrix A; b iT = i t h element of B; The two methods (i.e. the GD and LSE) are used in sequence for all available samples in the set of observations. After all samples have been used once, an epoch has passed and a new epoch is started using the first sample again. The training is finished after the MSE has converged. A sample consists of four input variables, namely the storage (S) and inflow (Q) at two different time levels, while the target is the release (R) at the current time (also see Figure 2.2): 22 2. M ODELLING OF R ESERVOIR O PERATIONS USING R EMOTELY S ENSED D ATA © ª I nput = S(t ), S(t − 1),Q(t ),Q(t − 1) , Tar g et = {R(t )} (2.4) 2.2.2. S ETTINGS To see if one measurement (from either imagery or altimetry products) per month is sufficient to capture the operational rules of a reservoir and to see what the effect of having more measurements is, four reservoirs, for which daily in-situ data (on inflow, release and storage) are available, are investigated. These reservoirs, listed in Table 2.3, are located in the United States and Vietnam. The length of the period of observations varies a lot, ranging from 5 to 62 years. First, a measurement error is introduced to the in-situ data to mimic the errors that would be introduced when using RS data, a hydrological model and the water balance. As described before, many different products and techniques are available to acquire information about the status of a reservoirs storage, inflow and release. All these techniques introduce some (and one more than another) error with regards to the actual value. To account for this uncertainty, all three time-series are adjusted with a offset which results in a dimensionless mean-square-error (MSE) of 0.5·10−3 . This value has been chosen on the authors discretion, since no consistent error can be found in the aforementioned research. Figure 2.3 shows a (small) part of the time-series for Tuyen Quang with an artificially added error compared to the actual data for all three variables. Next, from these error induced time-series, measurements are selected with four different frequencies. A measurement can be thought of as representing a satellite passing over a reservoir and thus acquiring a value for the storage, which can then be used in combination with inflow (from a hydrological model) to determine the release (via the water-balance). Figure 2.4 shows a part of the storage with added error for Tuyen Quang, selected measurements with different frequencies and a linear interpolation of these measurements. This interpolation is necessary, because in order to use ANFIS, a constant time-step between measurements is required. As can be seen in Figure 2.4, the interpolations fit the data better as the time-scale increases. From these interpolations, values are selected on a monthly scale, i.e. the values for storage, inflow and release at every 15t h of the month are selected and provided to train the parameters determining the membership and consequence functions. Finally, the simulated release (on a monthly time-scale) is compared to the actual daily and the average monthly release by calculation of the MSE. The described steps are repeated a 100 times for each time-scale (i.e. from 1 to 4 measurements per month) and reservoir, after which the MSEs are averaged and the variance of the MSEs is calculated. 2.3. R ESULTS Figure 2.6 shows results for the four considered reservoirs. Each graph consists of two parts, a training and a validation part. Furthermore three lines can be distinguished, one showing the actual daily release, another one the monthly average release and lastly one for the simulated release. This simulation is the output of ANFIS, which uses storage and inflow as input variables. The release used for training has been provided to the network as target values, while the validation data is used to evaluate the MSE of the output. It is important to note here, that the training is based on a interpolation of the daily values of days on which a measurement was made and the monthly average release is thus only given for reference and comparison. Moreover, the MSE from daily values (MSEd ai l y ), is based on the error for each day, meaning also the days on which no measurement was made are evaluated by means of a linear interpolation. The MSE based on monthly averages (MSEmont hl y ) can be calculated directly from the networks output without a need for interpolation. The results shown in Figure 2.6 are based on a time-scale of two measurements per month. Similar simulations have been made using frequencies of one to four measurements per month. Each of these simulations have been repeated a 100 times (each time with randomly selected dates for the measurements) and the average MSEs and their variance can be seen in Figure 2.5. When considering the average MSEs with respect to daily values, every reservoir shows a significant improvement when increasing the measurement frequency from one to two. Adding a third or a fourth measurement per month hardly has any effect however and for some reservoirs the results even become slightly worse. MSEmont hl y for Bull Lake, Seminoe and Tuyen Quang all show a better performance when compared to MSEd ai l y , the MSEmont hl y of Canyon Ferry shows some unexpected behaviour, when adding a third or fourth measurement per month, the results worsen significantly. 2.3. R ESULTS 23 MSE = 0.0005 1.0 1.0 0.4 0.2 0.0 1250 0.8 Normalised outflow 0.6 0.6 0.4 0.2 1300 1350 1400 Time [days] 1450 0.0 1250 1500 MSE = 0.0005 1.0 0.8 Normalised inflow Normalised storage 0.8 MSE = 0.0005 Actual With Error 0.6 0.4 0.2 1300 (a) Storage 1350 1400 Time [days] 1450 0.0 1250 1500 1300 (b) Inflow 1350 1400 Time [days] 1450 1500 (c) Release Time-scale = 2 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.2 0.4 0.2 0.0 1250 1.0 Normalised storage 1.0 Normalised storage Normalised storage Figure 2.3: Showing the error added to in-situ measurements to account for uncertainty in data acquired by remote sensing. 0.0 1300 1350 1400 Time [days] 1450 1500 (a) 1 measurement/month 1250 0.4 0.2 Storage with error Interpolation Measurement 1300 1350 1400 Time [days] 0.0 1450 1500 1250 (b) 2 measurements/month 1300 1350 1400 Time [days] 1450 1500 (c) 3 measurements/month 18 Daily Monthly 16 14 12 10 8 6 4 2 0 1 Average MSE [-] (10^-3) / Variance [-] (10^-5) Average MSE [-] (10^-3) / Variance [-] (10^-5) Figure 2.4: Showing the effect of increasing the frequency of measurements and the linear interpolation done with them. 12 10 2 3 4 1 2 3 4 Timescale [measurements/month] 8 6 4 2 0 1 3.0 2.5 2.0 1.5 1.0 0.5 0.0 (b) Canyon Ferry Average MSE [-] (10^-3) / Variance [-] (10^-5) Average MSE [-] (10^-3) / Variance [-] (10^-5) (a) Bull Lake 1 2 3 4 1 2 3 4 Timescale [measurements/month] 2 3 4 1 2 3 4 Timescale [measurements/month] 4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 (c) Seminoe Figure 2.5: The average (·10−3 ) and variance (·10−5 ) of the MSEs for 100 runs. 1 2 3 4 1 2 3 4 Timescale [measurements/month] (d) Tuyen Quang 24 2. M ODELLING OF R ESERVOIR O PERATIONS USING R EMOTELY S ENSED D ATA MSE_monthly = 0.0038, MSE_daily = 0.0104, Time-scale = 2 1.0 Normalised release 0.8 0.6 Actual Daily Actual Monthly Average Simulated Training Validation 0.4 0.2 0.0 0 1000 2000 3000 Time [days] 4000 (a) Bull Lake MSE_monthly = 0.0033, MSE_daily = 0.0052, Time-scale = 2 1.0 Normalised release 0.8 0.6 0.4 0.2 0.0 0 1000 2000 3000 Time [days] 4000 (b) Canyon Ferry MSE_monthly = 0.0015, MSE_daily = 0.0020, Time-scale = 2 1.0 Normalised release 0.8 0.6 0.4 0.2 0.0 0 5000 10000 15000 Time [days] 20000 (c) Seminoe MSE_monthly = 0.0025, MSE_daily = 0.0039, Time-scale = 2 1.0 Normalised release 0.8 0.6 0.4 0.2 0.0 0 200 400 600 800 Time [days] 1000 1200 1400 (d) Tuyen Quang Figure 2.6: Example simulations of release for the four considered reservoirs with a time-scale of two. 1600 2.4. D ISCUSSION 25 The variance of the MSEs for Canyon Ferry also differ from the results found at the other reservoirs, i.e. the variance of the errors is much larger. Nevertheless, adding a extra measurement per month has a stabilizing effect on the results, the spread of the errors becomes smaller. At Bull Lake the variance increases when adding a third or fourth measurement per month, Seminoe and and Tuyen Quang show a development of the variance as expected. 2.4. D ISCUSSION When increasing the amount of measurements used per month, one would expect the accuracy of results to increase and the variance to decrease. This is the case for Seminoe and Tuyen Quang. Bull Lake and Canyon Ferry show some unexpected results however. The MSEmont hl y of Canyon Ferry increases when using more than two measurements per month, this can be explained by taking a closer look at Figure 2.6b. The release peak for this reservoir in the validation part (around day 3750) shows a large difference between the actual daily and actual monthly average release. As the frequency of measurements increases, the chance of capturing this peak also increases, which results in a large error with respect to monthly averages. Also note that the errors are squared, implying that one large error has a more severe impact on the MSE than several small errors. The variance of the MSEs of Canyon Ferry is large when compared to the variances found at the other reservoirs. This can also be explained by the large peak around day 3750, missing this peak (because no measurement was taken during its occurence) simply results in a very significant error, while observing it, results in a much better MSE than the average. The variance of both the daily and monthly errors for Bull Lake increase as the measurement frequency increases from two to three, and from three to four. As can be seen in Figure 2.6a, the release of Bull Lake contains very sudden and intense peaks with a duration of only several days. As the frequency of measurements increases, the chance to observe these peaks also increases. When these peaks are observed, the simulated release will be further away from the average monthly release and since the output of the network is on a monthly scale, the 15 days before and after this peak will produce a large error. A possible solutions is to increase the time-scale of the neural network. Currently, the difference between two time levels (∆t ), as visualized in Figure 2.2, is one month. As more measurements per month (M) are available, ∆t could be defined as: ∆t = (M − 1)−1 with M > 1 (2.5) Allowing a finer time-scale of the output. Using a smaller ∆t might proof not to be that straight forward though. Perhaps new sample set-ups will be required, since simply applying the set-up used in this study will basically decrease the time-horizon over which the operator bases his rules. For example, for ∆t = 0.5 month, a sample set-up should be: © ª I nput = S(t ), S(t − 2),Q(t ),Q(t − 2) , Tar g et = {R(t )} (2.6) This way the application of a rule is still based on the current storage and inflow and the storage and inflow of a month ago. For the current and near future state of remotely sensed measurements, it is unlikely time-scales much finer than half a month will be achievable though. 2.5. C ONCLUSION It has been shown that with limited data available, the method proposed by Chapter 1 is still applicable. When considering the averaged MSE with regards to monthly averages for a time-scale of one, the result outperforms the results found in Chapter 1 for Bull Lake, Seminoe and Tuyen Quang, while Canyon Ferry performs equally well. It has to be noted however, that the results presented in this study are based on chance. If measurements miss a important event at a reservoir, the results can quickly worsen. The variance of errors for more detailed time-scales however, show that this problem can be overcome. Overall results become more stable as the frequency of measurements increases. This study therefore gives a proof-of-concept for the modelling of reservoir outflow with fuzzy logic and ANFIS using remotely sensed data. B IBLIOGRAPHY [1] K. Takeuchi, J. Magome, and H. Ishidaira, Estimating water storage in reservoirs by satellite observations and digital elevation model: A case study of the yagisawa reservoir, Journal of hydroscience and hydraulic engineering 20, 49 (2002). [2] WCD, Dams and development, (2000). [3] FAO, The water-energy-food nexus at FAO, (2014). [4] L. P. H. Van Beek and M. F. P. Bierkens, The global hydrological model PCR-GLOBWB: Conceptualization, parameterization and verification, Utrecht, The Netherlands: Utrecht University, Faculty of Earth Sciences, Department of Physical Geography (2009). [5] A. Baumgartner and E. Reichel, The world water balance: Mean annual global, continental and maritime precipitation evaporation and run-off (Elsevier Science Inc, 1975). [6] S. Russell and P. Campbell, Reservoir operating rules with fuzzy programming, Journal of Water Resources Planning and Management 122, 165 (1996). [7] L. A. Zadeh, Fuzzy sets, Information and Control 8, 338 (1965). [8] B. Shrestha, L. Duckstein, and E. Stakhiv, Fuzzy rule-based modeling of reservoir operation, Journal of Water Resources Planning and Management 122, 262 (1996). [9] D. P. Panigrahi and P. P. Mujumdar, Reservoir operation modelling with fuzzy logic, Water Resources Management 14, 89 (2000). [10] T. Takagi and M. Sugeno, Fuzzy identification of systems and its applications to modeling and control, SMC-15, 116 (1985). [11] J.-S. Jang, ANFIS: adaptive-network-based fuzzy inference system, IEEE Transactions on Systems, Man and Cybernetics 23, 665 (1993). [12] I. Flood and N. Kartam, Neural networks in civil engineering. i: Principles and understanding, Journal of Computing in Civil Engineering 8, 131 (1994). [13] F.-J. Chang and Y.-T. Chang, Adaptive neuro-fuzzy inference system for prediction of water level in reservoir, Advances in Water Resources 29, 1 (2006). [14] S. J. Mousavi, K. Ponnambalam, and F. Karray, Inferring operating rules for reservoir operations using fuzzy regression and ANFIS, Fuzzy Sets and Systems 158, 1064 (2007). [15] K. J. Åström and B. Wittenmark, Computer-Controlled Systems: Theory and Design, Third Edition (Courier Corporation, 2011). [16] B. Lehner, C. R. Liermann, C. Revenga, C. Vorosmarty, B. Fekete, P. Crouzet, P. Doll, M. Endejan, K. Frenken, J. Magome, et al., Global reservoir and dam database, version 1 (grandv1): dams, revision 01, Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC) (2011). [17] M. Schlüter, A. G. Savitsky, D. C. McKinney, and H. Lieth, Optimizing long-term water allocation in the amudarya river delta: a water management model for ecological impact assessment, 20, 529 (2005). [18] D. A. Wismer and R. Chattergy, Introduction to nonlinear optimization: a problem solving approach (North-Holland New York, Amsterdam, 1978). [19] D. E. Alsdorf, E. Rodríguez, and D. P. Lettenmaier, Measuring surface water from space, Reviews of Geophysics 45, RG2002 (2007). 27 28 B IBLIOGRAPHY [20] F. Pan, J. Liao, X. Li, and H. Guo, Application of the inundation area—lake level rating curves constructed from the SRTM DEM to retrieving lake levels from satellite measured inundation areas, Computers and Geosciences 52. [21] E. Alcântara, E. Novo, J. Stech, A. Assireu, R. Nascimento, J. Lorenzzetti, and A. Souza, Integrating historical topographic maps and SRTM data to derive the bathymetry of a tropical reservoir, Journal of Hydrology 389 (2010). [22] F. Baup, F. Frappart, and J. Maubant, Combining high-resolution satellite images and altimetry to estimate the volume of small lakes, Hydrol. Earth Syst. Sci. 18, 2007 (2014). [23] F. O. Annor, N. van de Giesen, J. Liebe, P. van de Zaag, A. Tilmant, and S. N. Odai, Delineation of small reservoirs using radar imagery in a semi-arid environment: A case study in the upper east region of ghana, Physics and Chemistry of the Earth, Parts A/B/C Integrated Water Resources Assessment, with Special Focus on Developing Countries, 34, 309 (2009). [24] J. Liebe, N. Van De Giesen, M. Andreini, T. Steenhuis, and M. Walter, Suitability and limitations of ENVISAT ASAR for monitoring small reservoirs in a semiarid area, IEEE Transactions on Geoscience and Remote Sensing 47, 1536 (2009). [25] D. Eilander, F. O. Annor, L. Iannini, and N. van de Giesen, Remotely sensed monitoring of small reservoir dynamics: A bayesian approach, Remote Sensing 6, 1191 (2014). [26] Z. Duan and W. G. M. Bastiaanssen, Estimating water volume variations in lakes and reservoirs from four operational satellite altimetry databases and satellite imagery data, Remote Sensing of Environment 134, 403 (2013). [27] J. Santos da Silva, S. Calmant, F. Seyler, O. C. Rotunno Filho, G. Cochonneau, and W. J. Mansur, Water levels in the amazon basin derived from the ERS 2 and ENVISAT radar altimetry missions, Remote Sensing of Environment 114, 2160 (2010). [28] V. H. Phan, R. Lindenbergh, and M. Menenti, ICESat derived elevation changes of tibetan lakes between 2003 and 2009, International Journal of Applied Earth Observation and Geoinformation Retrieval of Key Eco-hydrological Parameters for Cold and Arid Regions, 17, 12 (2012). [29] E. Muala, Y. A. Mohamed, Z. Duan, and P. van der Zaag, Estimation of reservoir discharges from lake nasser and roseires reservoir in the nile basin using satellite altimetry and imagery data, Remote Sensing 6, 7522 (2014). [30] P. Werbos, Beyond regression: New tools for prediction and analysis in the behavioral sciences, (1974).

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement