LaTex2Web logo

LaTeX2Web, a web authoring and publishing system

If you see this, something is wrong

Collapse and expand sections

To get acquainted with the document, the best thing to do is to select the "Collapse all sections" item from the "View" menu. This will leave visible only the titles of the top-level sections.

Clicking on a section title toggles the visibility of the section content. If you have collapsed all of the sections, this will let you discover the document progressively, from the top-level sections to the lower-level ones.

Cross-references and related material

Generally speaking, anything that is blue is clickable.

Clicking on a reference link (like an equation number, for instance) will display the reference as close as possible, without breaking the layout. Clicking on the displayed content or on the reference link hides the content. This is recursive: if the content includes a reference, clicking on it will have the same effect. These "links" are not necessarily numbers, as it is possible in LaTeX2Web to use full text for a reference.

Clicking on a bibliographical reference (i.e., a number within brackets) will display the reference.

Speech bubbles indicate a footnote. Click on the bubble to reveal the footnote (there is no page in a web document, so footnotes are placed inside the text flow). Acronyms work the same way as footnotes, except that you have the acronym instead of the speech bubble.

Discussions

By default, discussions are open in a document. Click on the discussion button below to reveal the discussion thread. However, you must be registered to participate in the discussion.

If a thread has been initialized, you can reply to it. Any modification to any comment, or a reply to it, in the discussion is signified by email to the owner of the document and to the author of the comment.

Table of contents

First published on Saturday, Mar 29, 2025 and last modified on Thursday, Apr 10, 2025 by François Chaplais.

Optimal Sensor Placement in Power Transformers Using Physics-Informed Neural Networks
arXiv
Published version: 10.48550/arXiv.2502.00552

Sirui Li Division of Decision and Control Systems, Department of Intelligent Systems, KTH Royal Institute of Technology, Stockholm, Sweden

Federica Bragone Division of Computational Science and Technology, Department of Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden

Matthieu Barreau

Tor Laneryd Hitachi Energy Research, Västerås, Sweden

Kateryna Morozovska

Keywords: physics-informed neural networks, optimal sensor placement, power components, convex optimization, thermal modelling

Abstract

Our work aims at simulating and predicting the temperature conditions inside a power transformer using Physics-Informed Neural Networks (PINNs). The predictions obtained are then used to determine the optimal placement for temperature sensors inside the transformer under the constraint of a limited number of sensors, enabling efficient performance monitoring. The method consists of combining PINNs with Mixed Integer Optimization Programming to obtain the optimal temperature reconstruction inside the transformer. First, we extend our PINN model for thermal modelling of power transformers to solve the heat diffusion equation from 1D to 2D space. Finally, we construct an optimal sensor placement model inside the transformer that can be applied to problems in 1D and 2D.

1 Introduction

Temperature monitoring of power components is an important factor to ensure their longevity and optimize operation and maintenance needs. Traditional methods for temperature evaluation are often limited by long computing times and a larger need for memory to solve the problem numerically. Therefore, researchers have been looking into using Physics-Informed Neural Networks (PINNs) [1] for estimating the thermal performance of the power components. Most of these works are oriented towards power transformers as they play a major role in power distribution [2, 3, 4].
Earlier works with the application of PINN in the energy domain are often related to applications in areas like power systems, high voltage components, aging estimation, and heat exchangers. For example, authors of works [5, 6, 7] and [8] explore how power systems engineers can benefit from using PINNs and deep learning to optimize power delivery and system performance. A few works explore the benefits of using PINNs for lifetime estimation of renewable power plants [9], power transformers [10], and even electrical insulation [11]. However, there is a lack of existing general knowledge on using PINNs for decision-making and control of energy systems and individual components, with only local solutions presented for specific problems like system identification [12]. Therefore, we aim to find a generalizable strategy for integrating PINNs into the decision-making process in energy engineering. The presented study focuses on temperature monitoring in power transformers, which can be extended further for similar problems in energy engineering.
Temperature monitoring is a key factor for the safe operation and maintenance of power transformers. The common methods for thermal analysis use computational fluid dynamics to model, such as the mineral-oil-immersed transformer windings [13, 14] and the thermal circuit of the transformer [15, 16, 17]. These models allow the calculation of the critical temperature and can also be used to simulate and analyze the internal temperature of the transformer given predefined weather and load conditions. However, the computational complexity of numerical methods increases exponentially with the complexity of the model, and its accuracy depends on the suitability of the model grid discretization. Therefore, in order to reduce the model complexity and adapt transformer temperature monitoring to real-time decision-making, data-driven methods, specifically PINNs, are explored in more detail in [4, 18, 19]. While proposed PINN models allow the simulation and prediction of the internal temperature of the transformer to be faster than numerical methods, they still require substantial computational resources and time to converge.
In our work, we aim to address the limitation of computational complexity when solving heat diffusion problems in power transformers with PINNs by introducing new temperature sensors. By adding additional data points inside the domain, we could reduce the problem’s size and ensure faster response times to temperature changes. In order to find the best location and the optimal number of sensors for temperature monitoring, we integrate PINN solution into a mixed integer linear programming (MILP) model. On this basis, we introduce a novel approach to decision-making and data collection in power components by integrating the PINN solution at the component design stage to determine the minimum number of most stable high-temperature points, which can effectively serve as a basis for faster and reliable real-time solution of the transformer thermal model. In addition to the 1D spacial model from [4], we extend the analysis to the 2D spacial model to validate the results for the sensor placement solution.

2 Methods

This section introduces the methods used for the study, including a description of the model for the temperature distribution in power transformers. We also introduce the PINN structure in 1D and 2D and describe the three proposed optimization models for the optimal sensor location for temperature detection.

2.1 Heat Diffusion Problem

The shape of the actual power transformer is relatively complex. For convenience of study, we have simplified its shape. In previous studies, the transformer was described as a line \( \mathbf{x} = (x) \in [0, 1]\) , and it was assumed that it was immersed in oil as a coolant. When the spatial dimension of the model is extended to 2D space \( \mathbf{x} = (x,y) \in [0, 1]^2\) , it is assumed that the coolant remains unchanged, and the transformer shape is described as a square, see Figure 1. We define \( \Omega\) as the space domain, \( \Omega = [0, 1]\) in 1D and \( \Omega = [0, 1]^2\) in 2D.

Example of a simple transformer.
Figure 2. Example of a simple transformer.
Graphically simplified transformer shape.
Figure 3. Graphically simplified transformer shape.
Figure 1. On the left, a transformer placed in oil, where \( I_p\) , \( I_s\) , \( V_p\) , and \( V_s\) are the primary current, secondary current, primary voltage, and secondary voltage, respectively. On the right, simplification of the transformer structure for models in 1D and 2D.

The general form of the heat diffusion equation for the 1D and 2D model is given by:

\[ \begin{equation} \rho c_p \frac{\partial u}{\partial t} = k \Delta_{\mathbf{x}} u + q \end{equation} \]

(1)

where \( \rho\) is the density, \( c_p\) is the heat capacity, \( k\) is the thermal conductivity, \( \Delta_{\mathbf{x}}\) is the Laplace operator. The term \( q\) represents the heat source that, for this problem, we define as:

\[ \begin{align} & q = q(\mathbf{x},t) = (P_0 + P_K(\mathbf{x},t) - h(u(\mathbf{x},t) - T_a(t))),\\ & P_K(\mathbf{x},t) = P_K^t(t)P_K^\mathbf{x}(\mathbf{x}), \\\end{align} \]

(2)

where \( P_0\) is the no-load loss, \( P_K(\mathbf{x},t)\) is the load loss, \( h\) is the convective heat transfer coefficient, and \( T_a(t)\) is the ambient temperature. The load loss has a component dependent on time, \( P_K^t(t)\) , and one depending on space, \( P_K^{\mathbf{x}}(\mathbf{x})\) , which differ for the 1D and 2D problems. Their forms are given by:

\[ \begin{align} & P_K^t(t) = \nu K(t)^2 \\ & P_K^{\mathbf{x}}(\mathbf{x}) = \left\{ \begin{array}{ll} 0.5\sin(3 \pi x) + 0.5, & \text{ if } \Omega = [0, 1], \\ 1, & \text{ if } \Omega = [0, 1]^2. \end{array}\right. \\\end{align} \]

(3)

where \( K(t)\) is the load factor, and \( \nu\) is the rated load loss. The boundary conditions for the 1D problem are defined as:

\[ \begin{align} & u(0,t) = T_a, \\ & u(1,t) = T_o, \\\end{align} \]

(4)

while for the 2D problem are:

\[ \begin{align} & u(0,y,t) = T_a, \\ & u(1,y,t) = T_o, \\ & u(x,0,t) = u(x,1,t) = \frac{T_a + T_o}{2} = T_{av} \\\end{align} \]

(5)

where \( T_a\) is the ambient temperature, \( T_o\) is the top oil temperature, and we define \( T_{av}\) The average temperature between \( T_a\) and \( T_o\) . Table 1 shows the values used for the parameters of the heat diffusion equation in 1D and 2D.

Table 1 Physical parameters and corresponding values of the heat diffusion equation in 1D and 2D.
ParametersUnit1D2D
Thermal conductivity, \( k\)[\( W/m\cdot K\) ]\( 50\)
Density, \( \rho\)[\( kg/m^3\) ]\( 900\)
Heat Capacity, \( c_p\)[\( J/kg\cdot K\) ]\( 2000\)
Heat Transfer Coefficient, \( h\)[\( W/m^2\cdot K\) ]10002000
No-Load Loss, \( P_0\)[\( W\) ]\( 1500\)
Rated Load Loss, \( \nu\)[\( W\) ]\( 83000\)

Figure 4 represents the data used for the problem. In particular, it consists of the ambient temperature \( T_a\) [\( ^\circ \) C], the top oil temperature \( T_o\) [\( ^\circ\) C], and the load factor \( K\) [p.u.], corresponding to the red, blue, and grey lines in the plot for the first 100 hours, that we are considering, of the dataset.

Ambient temperature, top oil temperature, and load factor measurements during the first 100) hours.
Figure 4. Ambient temperature, top oil temperature, and load factor measurements during the first \( 100\) hours.

2.2 PINNs

The structure of the PINN model is shown in Figure 5, consisting of a neural network part approximating the solution \( u\) from the inputted values and a residual side where the partial derivatives of the considered equation are evaluated using automatic differentiation [20].

Structure of the PINN model.
Figure 5. Structure of the PINN model.

We define the residual \( f\) to be our heat diffusion equation:

\[ \begin{equation} f(\mathbf{x},t) = \rho c_p\frac{\partial u}{\partial t} - k\Delta_{\mathbf{x}} u - (P_0 + P_K(\mathbf{x},t) - h(u(\mathbf{x},t) - T_a(t))). \end{equation} \]

(6)

The overall loss function of the PINN model is defined as the weighted sum of the mean-squared error assigned to the boundary conditions, MSE\( _u\) and the mean squared error of the residual, MSE\( _f\) :

\[ \begin{equation} MSE = \lambda_u MSE_u + \lambda_f MSE_f, \end{equation} \]

(7)

where

\[ \begin{align} & MSE_u = \frac{1}{N_u} \sum_{i=1}^{N_u} |\hat{u}(\mathbf{x}_u^i, t_u^i)-u^i|^2, \end{align} \]

(8)

\[ \begin{align} & MSE_f = \frac{1}{N_f} \sum_{i=1}^{N_f} |f(\mathbf{x}_f^i, t_f^i)|^2. \end{align} \]

(9)

From the equations, \( \{\mathbf{x}_u^i, t_u^i, u^i\}_{i=1}^{N_u}\) corresponds to the training data for the boundary conditions; \( \hat{u}\) is the approximation of the solution \( u\) at the training boundary coordinates \( \mathbf{x}_u\) and \( t_u\) ; \( \{\mathbf{x}_f^i, t_f^i\}_{i=1}^{N_f}\) are the collocation points of the residual \( f\) ; \( N_u\) is the number of boundary training points; \( N_f\) are the number of collocation points; \( \{\lambda_u, \lambda_f\}\) are the weights assigned to the corresponding MSE.

The structure of the PINN model consists of one input layer with \( 4+\delta\) neurons, where \( \delta\) corresponds to the spatial dimension of the problem, which is either 1 or 2 in this case. There are four hidden layers with 50 neurons each and one output layer with one neuron corresponding to the solution \( u\) . The input values are standardized to ensure the model’s stability and efficiency during training according to [21]. Then, output values are normalized. Furthermore, the parameters of the residual function are scaled using a fixed factor \( \beta=1000\) introduced in previous studies using the same model [19, 18, 4]. Other hyperparameters used in the PINN model are defined in Table 2.

Table 2 Hyperparameters for 1D and 2D PINN models
Parameter1D2D
Number of hidden layers\( 4 \)
Number of neurons of the hidden layers\( 50\)
Number of neurons of the input layerspace dimension \( + 4\)
Number of neurons of the output layer\( 1 \)
Activation functiontanh
Weight initializationXavier
OptimizerAdam, L-BFGS-B
Epochs per training, {[}Adam, L-BFGS-B{]}\( [5000, 5000] \)
Adam learning rate\( 1e-6 \)\( 1e-4 \)
Adam epsilon\( 1e-5 \)
L-BGFS-B maximum evaluations\( 20000 \)
L-BGFS-B max corrections\( 50 \)
L-BGFS-B max line search steps\( 50 \)
L-BGFS-B tolerance\( 1e-6 \)\( 1e-3\)
Number of training points \( N_f\)\( 20000 \)\( 40400\)
Number of training points \( N_u\)\( 100\)\( 20200\)
\( \lambda_u\)\( 1\)
\( \lambda_f\)\( 10000 \)

2.3 Sensors’ Optimization Models

We use a mixed integer optimization model to find the optimal sensor placement inside power transformers to detect the temperature’s stable points. A stable point is where the temperature changes the least over time. Therefore, it is defined where the absolute value of the time-averaged temperature change, given by the first-order partial derivatives with respect to space \( \nabla u\) , is at its minimum. For the 1D case, we consider \( \frac{\partial u}{\partial x}\) , while for the 2D case, we take the sum of the two partial derivatives with respect to \( x\) and \( y\) , i.e., \( \frac{\partial u}{\partial x} + \frac{\partial u}{\partial y}\) . The first-order partial derivatives are obtained when calculating the residual loss function MSE\( _f\) . The goal is to place sensors at the stable points of the transformer. We set up a minimum and a maximum number of sensors, \( n_{min}\) and \( n_{max}\) , respectively, that can be placed inside the transformers.

In our study, we analyze three optimization models, which we will refer to throughout the paper as Model 1, Model 2, and Model 3 for simplicity.

Model 1 is defined in the following way:

\[ \begin{equation} \begin{aligned} \min_{\mathbf{s}} ~ & \mathbb{E}_{t \in D} \left| \nabla \cdot u( \mathbf{x}, t) \right| \cdot \mathbf{s}, \\ \text{s.t.} ~ & \mathbf{s} = [s_1, s_2, \ldots, s_{N_x\cdot N_y}], \\ & s_i \in \{0,1\}, \\ & n_{\min} \leq \sum_i s_i \leq n_{\max}, \end{aligned} \end{equation} \]

(10)

where \( \nabla \cdot\) is the divergence operator , \( \mathbb{E}_{t \in D}\) refers to the mean operation over time with \( D\) being a discrete set of time points. We also define a grid \( \bar{\mathbf{x}}\) over \( \bar{\Omega}_d\) as:

\[ \bar{\Omega}_d = \{ \mathbf{x} \in \Omega|\text{distance to the boundary of } \Omega \text{ is more than } d \} \]

with \( N_x\) columns and \( N_y\) rows. The binary variable \( s_i\) indicates whether there is a sensor at the corresponding position \( \mathbf{x}_i\) . To be more clear,

\[ s_i = \left\{ \begin{array}{lll} \begin{aligned} 1, && \text{if there is a sensor at \( \mathbf{x}_i\) ,} \\ 0, && \text{otherwise}. \end{aligned}\end{array}\right. \]

Model 1 is the basic optimization model, and it might cause the sensors to be clustered, not giving a good overall temperature representation inside the transformer. Therefore, an additional parameter is introduced to represent the minimum distance between two sensors, enforcing more sparsity. We define it as Model 2, and it is expressed as follows:

\[ \begin{equation} \begin{aligned} \min_{\mathbf{s}} ~ & \mathbb{E}_{t \in D} ( \left |\nabla \cdot u( \mathbf{x}, t) \right| )\cdot \mathbf{s} \\ \text{s.t.} ~ & \mathbf{s} = [s_1, s_2, \ldots, s_{N_x\cdot N_y}], \\ & s_i \in \{0,1\}, \\ & s_i + s_j \leq 1, ~ \text{if } \|\mathbf{x}_i-\mathbf{x}_j\| < d ~ \text{and} ~ \forall i, j,j \neq i, \\ & n_{\min} \leq \sum_i s_i \leq n_{\max}, \end{aligned} \end{equation} \]

(11)

The setting of \( d\) depends on the user’s "experience". Since sensors cannot be placed at a distance less than \( d\) , this forced placement may cause the sensors to miss important information if \( d\) is set too large. To ensure that the sensors are placed in a position that can monitor temperature at key locations inside the transformer and collect sufficient information simultaneously, an additional distance parameter \( d_1\) is included in the optimization model to ensure that the sensors are spread out to a certain extent. Therefore, the distance \( d\) describes the distance that must be maintained between the sensors, which also depends on the sensor size. The distance \( d_1\) is another limit between two sensors, which describes the measurement overlap caused by the distance between the two sensors, which can cause information waves. The corresponding waste of the two sensors is defined as the cost \( c_i^j\) , that is,

\[ \begin{equation} c_i^j = \left\{ \begin{array}{ll} \mathbb{E}_{t \in D} \left(\nabla \cdot u (\mathbf{x}, t) \right) \left( d_1 - \|\mathbf{x}_i - \mathbf{x}_j\| \right) & \text{if } j \neq i, \\ 0 & \text{if } i = j. \end{array}\right. \end{equation} \]

(12)

With the help of the big-M formulation, the final optimization model, which we define as Model 3, becomes:

\[ \begin{equation} \begin{aligned} \min_{\mathbf{s}} ~ & \mathbb{E}_t ( \left |\nabla_{\mathbf{x}} \cdot u( \mathbf{x}, t) \right| )\cdot \mathbf{s} + \sum_i \mathcal{L}_i \\ \text{s.t. } ~ & \mathbf{s} = [s_1, s_2, \cdots, s_{N_x \cdot N_y}], \\ & s_i +s_j \leq 1, ~ \text{if } \| \mathbf{x}_i - \mathbf{x}_j \|<d ~ \& ~ \forall i, j,j \neq i, \\ & n_{min} \leq \sum_i s_i \leq n_{max}, \\ & s_i \in \{0,1\}, ~ \text{for } i = 1, \cdots, N_x \cdot N_y, \\ & c_i = \sum_{j=1}^{N_x} s_j c_i^j, ~ \text{for } i = 1, \cdots, N_x \cdot N_y, \\ & c_i - M(1 - s_i) \leq \mathcal{L}_i \leq c_i + M(1 - s_i), ~ \text{for } i = 1, \cdots, N_x\cdot N_y, \\ & \mathcal{L}_i \leq M s_i, ~ \text{for } i = 1, \cdots, N_x \cdot N_y, \\ \end{aligned} \end{equation} \]

(13)

where \( \mathcal{L}_i\) represents the cost of placing a sensor at position \( \mathbf{x}_i\) , and the penalty coefficient \( M\) is selected as \( 1000\) .

3 Results

This section reports the results obtained to model the temperature inside a power transformer in 1D and 2D and the corresponding optimal sensor placement model. The 1D model is run for 20000 epochs, 10000 using Adam optimizer, and 10000 with L-BFGS-B, taking approximately 45 minutes using GPUs from Google Colab [22]. The hyperparameters used for the model are listed in Table 2.

Figure 6 shows the solution for the 1D problem for the first 100 hours of the dataset. In particular, Figure 7 represents the reference solution calculated using Comsol, while Figure 8 shows the results obtained with PINNs.

Comsol solution.
Figure 7. Comsol solution.
PINN solution.
Figure 8. PINN solution.
Figure 6. Solution of the first 100 hours for the 1D problem using Comsol and PINN.

We can notice that the PINN solution is already very close to the reference one. We can compare the results more clearly by looking at the plots in Figure 9, where we consider five specific times, i.e., \( t=15\) , \( t=30\) , \( t=50\) , \( t=65\) , and \( t=80\) . The blue lines represent the reference solution using Comsol, and the red-dotted lines represent the PINN solution. PINNs capture almost perfectly the solution, especially for the first time steps. For \( t=80\) , there is a slight shift from the reference solution for the PINN model with a minimal difference.

Comparison of the solution obtained by Comsol (blue line) and PINN (red-dotted line) for the 1D problem at several time points.
Figure 9. Comparison of the solution obtained by Comsol (blue line) and PINN (red-dotted line) for the 1D problem at several time points.

Figure 10 shows the evolution of the loss function over the epochs. The evolution of the relative \( L_2\) errors is shown in Figure 11. Figure 12 represents the relative \( L_2\) error between the model’s temperature prediction and the reference after each epoch. Figure 13 shows the same but only for the top-oil temperature \( T_o\) . For both cases, the errors decrease smoothly, reaching a value of \( 1.306\cdot10^{-1}\) for the overall temperature distribution and \( 5.532\cdot10^{-3}\) for \( T_o\) .

Loss functions for the 1D PINN model.
Figure 10. Loss functions for the 1D PINN model.
Overall temperature distribution.
Figure 12. Overall temperature distribution.
Top-oil temperature T_o)
Figure 13. Top-oil temperature \( T_o\)
Figure 11. Relative \( L_2\) errors for the \( 1D\) problem between the reference solution given by Comsol and the PINN solution.

We now analyze the results obtained by the optimization model to find the optimal positions for multiple sensors. We set the minimum and maximum number of sensors as \( n_{min}=5\) and \( n_{max}=10\) , respectively. The results are shown in Figure 14. Each plot shows the time-averaged temperature results (the blue lines) and the time-averaged first-order spatial derivatives of the temperature (the green-dotted lines) for a more explicit representation of the sensor placement.

Model 1
Figure 15. Model 1
Model 2 with d=0.05)
Figure 16. Model 2 with \( d=0.05\)
Model 3 with d=0.05, d_1=0.2)
Figure 17. Model 3 with \( d=0.05, d_1=0.2\)
Figure 14. The optimal sensor placement for 1D problem, with \( n_{min}=5\) and \( n_{max}=10\) .

In particular, Figure 15 shows the sensor placement using Model 1, and Figures 16 and 17 represent the results for Model 2 and Model 3, respectively. For Model 2, the distance used is \( d=0.05\) ; similarly, Model 3 uses \( d=0.05\) with the additional distance parameter value \( d_1=0.2\) . The sensors are mainly located around the stable points, i.e., the positions where the temperature has the least change over time. In Model 1, the sensors are all placed close to each other, meaning that their information overlaps as they do not cover enough space range. This problem is overcome by the other two optimization models, where the sensors are more spread out within the region of interest with the help of the distance parameters and the inclusion of the penalty parameter for Model 3. The sensor placement of the two optimized models is still concentrated around the stable points; however, Model 3 has a better distribution of the sensors, keeping more distance between them, which makes it more practical to use when detecting the temperature distribution inside a power transformer.

Moving to the 2D problem, the complexity increases, making the training exponentially more expensive compared to the 1D counterpart. Moreover, the training time also increases. With the same hyperparameters, training the 2D problem takes approximately 60 minutes. The hyperparameters utilized are reported in Table 2.

To show the results for the 2D problem, we pick three arbitrary time points: \( t=10\) , \( t=50\) , and \( t=80\) . Figure 18 shows the plots for these time points. In particular, Figures 19, 21 and 23 show the reference solution obtained with Comsol, while Figures 20, 22 and 24 display the PINN solutions. We can notice that the model predicts the temperature distribution quite accurately for all three time steps compared to the Comsol solution. In Figure 25 we can look at a closer comparison between the reference solution, the blue lines, and the PINN solution, the red-dotted lines. The comparisons are for the three time points, and we took four random locations for the \( y\) values, i.e., \( y=0.3\) , \( y=0.5\) , \( y=0.7\) , and \( y=0.9\) . In particular, Figures 26, 27, and 28 show the results for \( t=10\) , \( t=50\) , and \( t=80\) , respectively. We can notice slight discrepancies between the solutions, especially for the first three \( y\) locations for \( t=10\) and \( t=50\) . Overall, the solutions are already notable, given the amount of training and the number of training and collocation points used. However, to obtain more accurate and reliable estimations with PINNs, it is necessary to train longer and use more training points.

Comsol, t=10) .
Figure 19. Comsol, \( t=10\) .
PINN, t=10) .
Figure 20. PINN, \( t=10\) .
Comsol, t=50) .
Figure 21. Comsol, \( t=50\) .
PINN, t=50) .
Figure 22. PINN, \( t=50\) .
Comsol, t=80) .
Figure 23. Comsol, \( t=80\) .
PINN, t=80) .
Figure 24. PINN, \( t=80\) .
Figure 18. Solution for the 2D problem using Comsol (left) and PINNs (right) for \( t=10\) , \( t=50\) , and \( t=80\) .
 t=10) .
Figure 26. \( t=10\) .
 t=50) .
Figure 27. \( t=50\) .
 t=80) .
Figure 28. \( t=80\) .
Figure 25. Comparison of the solution obtained by Comsol (blue line) and PINNs (red-dotted line) for the 2D problem at specific \( y\) location and \( t=10\) , \( t=50\) , and \( t=80\) .

We can look at Figure 29, where the loss functions are plotted to see that the training might not be enough at this stage. Similarly to the 1D case, we use 10000 epochs with Adam optimizer and 10000 epochs with L-BFGS-B. From the plot, we can notice that the training with Adam has still not properly converged after 10000 epochs, therefore, longer training is required to converge to the optimal solution. Figure 30 shows the evolution of the relative \( L_2\) errors for the overall solution \( u\) and the top-oil temperature \( T_o\) . Figure 31 shows the error after each epoch for the temperature distribution over the whole domain, which reaches a value of \( 1.278\cdot10^{-1}\) at the end of the training. Figure 32 represents the error for the top-oil temperature, achieving a value of \( 7.577\cdot10^{-3}\) .

Loss functions for the 2D PINN model.
Figure 29. Loss functions for the 2D PINN model.
 L_2) error of the overall temperature solution.
Figure 31. \( L_2\) error of the overall temperature solution.
 L_2) error of the T_o) solution.
Figure 32. \( L_2\) error of the \( T_o\) solution.
Figure 30. \( L_2\) error for the \( 2D\) problem between the reference solution given by Comsol and the PINN solution.

Given the larger domain, the complexity of finding the optimal placement of sensors in the 2D problem increases compared to the 1D one. Figure 33 shows the results obtained setting up the minimum and the maximum number of sensors as \( n_{min}=5\) and \( n_{max}=10\) . On the left plots, the time-averaged temperature results are displayed, and, in addition, on the right plots, the sum of the time-averaged first-order spatial derivatives of the temperature \( \Big|\frac{\partial u}{\partial x} + \frac{\partial u}{\partial y}\Big|(x,y)\) is also shown to represent the results more precisely. We investigate similar cases for the optimized models as for the 1D problem. Figure 34 shows the sensor placement with Model 1, while Figures 35 and 36 display Model 2 and Model 3 results, respectively. As for the 1D problem, we take \( d=0.05\) for both optimized models, adding the distance parameter value \( d_2=0.2\) for Model 3. As we introduce penalties for distance and temperature gradient, the dispersion of the sensors can be controlled more while ensuring that the significant temperatures, i.e., the stable points, are considered.

Model 1
Figure 34. Model 1
Model 2 with d=0.05)
Figure 35. Model 2 with \( d=0.05\)
Model 3 with d=0.05, d_1=0.2)
Figure 36. Model 3 with \( d=0.05, d_1=0.2\)
Figure 33. The optimal sensor placement for the 2D problem, with \( n_{min} = 5\) , \( n_{max}=10\) .

4 Discussion and Conclusions

Physics-Informed Neural Networks show many benefits for predicting internal temperatures of power components such as power transformers. However, with higher spatial dimensions the complexity of the problem increases and so does the training time. To address this issue we consider installing monitoring systems that would serve as reference points for the model and speed up the training process. To find the optimal sensors’ placement, we use PINNs and mixed integer optimization.

This work explores different strategies for finding the optimal number and position of sensors for 1D and 2D spatial problems. The final model not only allows the user to control the spread of the sensors, but it also leverages the distance and transformer temperature to reduce the loss of temperature information that can result from this spread. The proposed model is a general solution to find the optimal sensor placement, regardless of the shape and size of the transformer; therefore, it can be adapted for a variety of applications and potentially used to solve similar problems for other types of electric components that require monitoring.

Acknowledgment

This work is supported by the Vinnova Program for Advanced and Innovative Digitalisation (Ref. Num. 2023-00241) and Vinnova Program for Circular and Biobased Economy (Ref. Num. 2021-03748) and partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.

References

[1] M. Raissi, P. Perdikaris, G. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics 378 (2019) 686–707. 10.1016/j.jcp.2018.10.045.

[2] F. Bragone, Physics-informed machine learning in power transformer dynamic thermal modelling, Master's thesis, KTH, Mathematical Statistics (2021).

[3] F. Bragone, Physics-informed neural networks and machine learning algorithms for sustainability advancements in power systems components, qC 20231010 (2023).

[4] O. W. Odeback, F. Bragone, T. Laneryd, M. Luvisotto, K. Morozovska, Physics-informed neural networks for prediction of transformer’s temperature distribution, in: 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), 2022, pp. 1579–1586. 10.1109/ICMLA55696.2022.00215.

[5] G. S. Misyris, J. Stiasny, S. Chatzivasileiadis, Capturing power system dynamics by physics-informed neural networks and optimization, in: 2021 60th IEEE Conference on Decision and Control (CDC), 2021, pp. 4418–4423. 10.1109/CDC45484.2021.9682779.

[6] J. Stiasny, B. Zhang, S. Chatzivasileiadis, Pinnsim: A simulator for power system dynamics based on physics-informed neural networks, Electric Power Systems Research 235 (2024) 110796. 10.1016/j.epsr.2024.110796.

[7] R. Nellikkath, S. Chatzivasileiadis, Physics-informed neural networks for ac optimal power flow, Electric Power Systems Research 212 (2022) 108412. 10.1016/j.epsr.2022.108412.

[8] R. Nellikkath, I. Murzakhanov, S. Chatzivasileiadis, A. Venzke, M. K. Bakhshizadeh, Physics-informed neural networks for phase locked loop transient stability assessment, Electric Power Systems Research 236 (2024) 110790. 10.1016/j.epsr.2024.110790.

[9] I. Ramirez, J. I. Aizpurua, I. Lasa, L. del Rio, Probabilistic feature selection for improved asset lifetime estimation in renewables. application to transformers in photovoltaic power plants, Engineering Applications of Artificial Intelligence 131 (2024) 107841. 10.1016/j.engappai.2023.107841.

[10] I. Ramirez, J. Pino, D. Pardo, M. Sanz, L. del Rio, A. Ortiz, K. Morozovska, J. I. Aizpurua, Residual-based attention physics-informed neural networks for efficient spatio-temporal lifetime assessment of transformers operated in renewable power plants, arXiv preprint arXiv:2405.06443 (2024).

[11] F. Bragone, K. Oueslati, T. Laneryd, M. Luvisotto, K. Morozovska, Physics-informed neural networks for modeling cellulose degradation in power transformers, in: 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), 2022, pp. 1365–1372. 10.1109/ICMLA55696.2022.00216.

[12] S. Stock, J. Stiasny, D. Babazadeh, C. Becker, S. Chatzivasileiadis, Bayesian physics-informed neural networks for robust system identification of power systems, in: 2023 IEEE Belgrade PowerTech, 2023, pp. 1–6. 10.1109/PowerTech55446.2023.10202692.

[13] IEC, Power transformers – part 7: Loading guide for oil-immersed power transformers, IEC 60076-7:2018 (2018).

[14] Ieee guide for loading mineral-oil-immersed transformers and step-voltage regulators, IEEE Std C57.91-2011 (Revision of IEEE Std C57.91-1995) (2012) 1–12310.1109/IEEESTD.2012.6166928.

[15] D. Susa, M. Lehtonen, H. Nordman, Dynamic thermal modelling of power transformers, IEEE transactions on Power Delivery 20 (1) (2005) 197–204.

[16] D. Susa, M. Lehtonen, Dynamic thermal modeling of power transformers: further development-part i, IEEE transactions on power delivery 21 (4) (2006) 1961–1970.

[17] D. Susa, M. Lehtonen, Dynamic thermal modeling of power transformers: further development-part ii, IEEE transactions on power delivery 21 (4) (2006) 1971–1980.

[18] T. Laneryd, F. Bragone, K. Morozovska, M. Luvisotto, Physics informed neural networks for power transformer dynamic thermal modelling, IFAC-PapersOnLine 55 (20) (2022) 49–54.

[19] F. Bragone, K. Morozovska, P. Hilber, T. Laneryd, M. Luvisotto, Physics-informed neural networks for modelling power transformer’s dynamic thermal behaviour, Electric power systems research 211 (2022) 108447.

[20] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Automatic differentiation in machine learning: a survey, Journal of machine learning research 18 (153) (2018) 1–43.

[21] M. Shanker, M. Y. Hu, M. S. Hung, Effect of data standardization on neural network training, Omega 24 (4) (1996) 385–397.

[22] E. Bisong, E. Bisong, Google colaboratory, Building machine learning and deep learning models on google cloud platform: a comprehensive guide for beginners (2019) 59–64.

I am normally hidden by the status bar