



# Semester Thesis

# Design of an Automated Aging Analysis for the Motherboards in the CMS ECAL

Spring Term 2018

# Contents

| 1        | Intr | roduction                                                                                                   |   |
|----------|------|-------------------------------------------------------------------------------------------------------------|---|
|          | 1.1  | Motherboards                                                                                                |   |
|          | 1.2  | Reliability Mathematics                                                                                     |   |
|          |      | 1.2.1 Bathtub curve                                                                                         |   |
|          |      | 1.2.2 Useful operating period                                                                               |   |
|          |      | 1.2.3 Time truncated test                                                                                   |   |
| <b>2</b> | Tes  | t Setups                                                                                                    |   |
|          | 2.1  | General Structure                                                                                           |   |
|          | 2.2  | Interconnection Test                                                                                        |   |
|          |      | 2.2.1 Test Design                                                                                           |   |
|          |      | 2.2.2 User Interface                                                                                        |   |
|          | 2.3  | Leakage Current Test                                                                                        |   |
|          |      | 2.3.1 Test Design $\ldots$ |   |
|          |      | 2.3.2 User Interface                                                                                        |   |
|          |      | 2.3.3 Measurements $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$                  |   |
|          | 2.4  | Capacitor Test                                                                                              |   |
|          |      | 2.4.1 Test Design $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$          |   |
|          |      | 2.4.2 User Interface $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$       |   |
|          |      | 2.4.3 Measurements                                                                                          | • |
| 3        | Pre  | liminary Test Data                                                                                          |   |
|          | 3.1  | Interconnection Test                                                                                        | • |
|          | 3.2  | Leakage Current Test                                                                                        |   |
|          | 3.3  | Capacitor Test                                                                                              | • |
| 4        | Cor  | nclusions                                                                                                   |   |
| A        | ckno | wledgements                                                                                                 |   |
| B        | blio | graphy                                                                                                      |   |

# Abstract

Currently, the CMS experiment at CERN's LHC is preparing for the high luminosity run that will start in 2026. While most of the electrical components inside the detector will be replaced, some are envisaged to remain. Intensive tests need to be performed in order to guarantee the reliability of these legacy components. In this study, automated setups to analyze the reliability of the motherboards that will remain in the CMS ECAL are presented. The conductances of the interconnections, the possible presence of short circuits and the functioning of the capacitors on the motherboards are tested. Preliminary data resulting from these tests is stated and its significance concerning the reliability of the motherboards discussed in detail.

### Chapter 1

# Introduction

The "Phase-2" upgrade program for CERN's Large Hadron Collider (abbreviated by HL-LHC) will increase the luminosity to  $5 \times 10^{34} \,\mathrm{cm}^{-2} \mathrm{s}^{-1}$ , placing high demands on the data processing speed and radiation hardness of the experiments involved. Under the planned operation in this regime for 10 years, the total integrated radiation dose is expected to increase by a factor of about ten with respect to the initial LHC design value. [1] To improve and guarantee the detection of particles in LHC's experiment CMS, the entire tracking system must thus be replaced. A crucial part of the CMS experiment is the electromagnetic calorimeter (ECAL), which is used to measure the energies of photons and electrons. [2, p. 7] It played a key role in the detection of the world-renowned Higgs particle. Most of ECAL's components will be replaced for the upgrade, some components however, including the so-called motherboards, will remain. [2, p. 7] These passive circuit boards are used for signal connection and to deliver the bias voltage to the photodiodes. Although the circuits of the motherboards are very simple and only involve passive components, their reliability is crucial to the operation of the ECAL. This is why their aging properties must be studied thoroughly, as stated in the Technical Design Report for the Phase-2 upgrade of the CMS barrel calorimeters: "[the motherboards] must be qualified for aging and radiation hardness to ensure that they will perform well until the end of the HL-LHC program." [2, p. 29] In this paper, different automated test setups that can be used to analyze the aging properties of the motherboards will be described. Furthermore, preliminary results acquired using these test setups as well as the further amount of test data needed to be able to claim a certain reliability for the different components will be discussed.

The following sections of this chapter are devoted to clarifications needed in order to understand the test setups as well as the preliminary test data described in Chapters 2 and 3. In Section 1.1, the properties of the motherboards and the desired reliabilities for the different components will be discussed. Section 1.2 is devoted to an aging model that allows statistical claims about the reliability of electrical components.

#### 1.1 Motherboards

In this section, the properties of the motherboards as well as the desired reliabilities for the different components are discussed.

The main functions of the motherboards is to provide the bias voltage to the avalanche photodiodes (APDs) as well as to bring the electrical pulse generated by a photon passing through one of the two photodiodes to the analog-to-digital converter (ADC). In addition, the motherboards also contain a low-pass filter for the bias voltage, that increases the signal-to-noise ratio. A picture of a motherboard is given in Figure 1.1. The five plug cards, connected to the bottom side of the motherboard with brown ribbons, are referred to as "kaptons". The kaptons are connected to the five plug bars on the upper side of the motherboards that each contain 5 channels, giving 25 channels per motherboard in total. Each channel in turn contains two resistors and one capacitor that form the low-pass filter.



Figure 1.1: Picture of a motherboard. The reliability of these passive circuit boards is crucial to the ECAL, since the motherboards serve simple but extremely important purposes.

The part of the detector's circuit that involves the motherboards is shown schematically in Figure 1.2. Note that only the circuit for one of the 25 channels is displayed.



Figure 1.2: Part of the electrical circuit that involves the motherboards. Only one of the total 25 channels per motherboard is displayed. The motherboards deliver the bias voltage to the avalanche photodiodes (APD), bring the signal generated in the APDs to the pre-amplifier, followed by an analog-to-digital converter (ADC) and increase the signal-to-noise ratio with a low-pass filter.

In order to assure the well-functioning of the motherboards, three tests have to be performed. First, the conductivities of all interconnections (including the resistors) have to be tested. If an interconnection breaks, the channel corresponding to two photodiodes is lost. The requirement for the reliability of the interconnections is that at the end of HL-LHC, when the motherboards will have aged to 31 years, no more than 1% of all interconnections are allowed to be broken.

The second test that needs to be performed involves the so-called leakage current of the motherboard. The leakage current is the current that is present when a high DC voltage is applied while the electrical circuit of the motherboard is open. Of course in theory the current in an open circuit should always be zero, but in real objects there always exists a non-vanishing current due to the resistance of the open circuit not being infinitely large. By measuring the leakage current one can determine if the motherboard is in danger to develop a short circuit or if there is already a short circuit present on the board. It is of crucial importance that the motherboards do not have short circuits, since otherwise 50 channels corresponding to 100 photodiodes will be lost at once, due to the fact that the same bias voltage is always shared between two motherboards inside the detector. Since this scenario is highly undesirable, no more than 3 out of the total 2448 motherboards inside the CMS detector should have failed entirely at the end of HL-LHC.

The last test involves the capacitors of the motherboards. If a capacitor breaks, the low-pass filter for the channel does no longer work, which results in the noise amplitude increasing by a factor of roughly 2. The requirement for the reliability of the capacitors is similar to the one for the interconnections: no more than 1% are allowed to be broken at the end of HL-LHC.

#### **1.2** Reliability Mathematics

In this section, a model for the reliability of electrical systems and the statistical tools needed to claim reliabilities at given confidence levels will be discussed. First, the so-called "bathtub" curve model will be established and described quantitatively via the mean time to fail (MTTF). Then the so-called "time truncated test" used to be able to claim a certain MTTF for test samples will be presented.

#### 1.2.1 Bathtub curve

The failure curve of a large population of statistically identical items is usually described by the so-called "bathtub" failure curve shown in Figure 1.3. [3, p. 27]



Figure 1.3: The "bathtub" failure curve (inspired by [3, p. 26]). The instantaneous failure rate of items having survived to time t (hazard rate) is displayed for the different periods in the life of electrical components.

The hazard rate z(t) is the instantaneous failure rate of items having survived to time t. [3, p. 10] In the beginning, the hazard rate is high due to errors that occurred in the production of the items (infant mortality). As time progresses, the hazard rate decreases, since all items with production errors "have already died out". In the "useful operating period" that follows, the hazard rate remains constant. This means that the probability of failure going from t to t + dt does not depend on the age t of the item anymore. The nature of the fails that occur in the useful operating period can be thought of as "accidents" that happen randomly and independent of the age of an item. Finally, in the wear-out period the hazard rate increases again due to the wear or aging mechanisms involved. [3, p. 27] In order to predict the reliability of electrical components having survived the infant mortality, the constant hazard rate during the useful operating period and the time when wear-out effects set in have to be known.

#### 1.2.2 Useful operating period

The hazard rate, defined above as the instantaneous failure rate of items having survived to time t, can also be written as f(t)/R(t), where f(t) is the failure rate density (probability that an item will fail between t and t + dt) and R(t) is the probability of success (probability that an item will not have failed until the time t). [3, p. 11] By using the probability of failure F(t) (probability that an item will have failed at the time t), the hazard rate can be rewritten as

$$z(t) = \frac{f(t)}{R(t)} = \frac{\frac{dF}{dt}}{1 - F(t)} \equiv \lambda,$$

where  $\lambda$  is the constant value of the hazard rate during the useful operating period. The last equality in the equation above yields an ordinary differential equation of first order in F(t) that is solved by

$$F(t) = 1 - e^{-\lambda t}$$

for the initial condition F(0) = 0. For the failure rate density it follows that

$$f(t) = \frac{dF}{dt} = \lambda e^{-\lambda t}.$$

Instead of the failure rate  $\lambda$ , the mean time to fail MTTF is usually used to parameterize the exponential decay. It holds that

$$\text{MTTF} = \int_0^\infty t f(t) dt = \frac{1}{\lambda}.$$

#### **1.2.3** Time truncated test

The upper and lower bounds for the MTTF can be determined experimentally in a time truncated test. The expression "time truncated" means that a certain number of sample items are tested for a fixed time interval and the number of fails that occur during this time is counted. [4] The total testing time, that is the number of sample items that were tested multiplied by the time each sample was tested for, is usually referred to as the total accumulated test time. The number of fails that occur in a given accumulated test time at a certain MTTF follow a  $\chi^2$  distribution. [5] The upper and lower bounds for the MTTF, given a time truncated test with total accumulated test time T and r fails having occurred during this time are given by

$$MTTF_{low} = \frac{2T}{\chi^2_{1-CL, 2(r+1)}},$$
$$MTTF_{upp} = \frac{2T}{\chi^2_{CL, 2r}}.$$

CL is the level of confidence with which it can be claimed that the MTTF is not lower than  $\text{MTTF}_{\text{low}}$  or not higher than  $\text{MTTF}_{\text{upp}}$ . The expression  $\chi^2_{a, b}$  stands for the inverse  $\chi^2$  CDF, where *a* is the argument (the area under the  $\chi^2$  PDF) and *b* the degrees of freedom of the  $\chi^2$  function (see Figure 1.4).

The upper and lower bounds for the MTTF, the accumulated test time needed to be able to claim a certain MTTF and the maximum number of fails expected at a given MTTF and accumulated test time can be calculated with the script reliability/chi\_squared/Chi Squared Estimation.ipynb.



Figure 1.4:  $\chi^2$  CDF (unedited image from https://www.geogebra.org/m/p5GKdHnA). Both areas shaded in gray amount to *a*. The parameter *b* states the degrees of freedom of the  $\chi^2$  function.

### Chapter 2

# Test Setups

In this chapter, the test setups to automatically measure the properties described above that need to be tested including the conductivities of the interconnections, the leakage current and the capacities for each motherboard are described.

#### 2.1 General Structure

In this section I will briefly discuss the background and general structure of the three test setups.

In order to increase the total accumulated test time and to be able to find the time when the wear-out period of the motherboards sets in (or at least to find a lower bound for this time), a sample group consisting of 16 motherboards was aged artificially in thermal cycles. One such cycle lasted 22 minutes and involveed the following steps: during 6 minutes the temperature was linearly increased from 0 to 30 degrees Celsius. Then the temperature was held constant at 30 degrees Celsius for 10 minutes, afterwards it was again linearly decreased to 0 degrees Celsius. 10 of these thermal cycles correspond to the aging the motherboards undergo during 1 year inside the detector. The cycles were designed to model the heating up and cooling down of the electrical components during the phases when the detector is started or shut down. 8 out of the 16 motherboards were in addition to the thermal cycles artificially aged at a constant temperature beforehand.

The software that is needed to perform the measurements for all three tests is based on Jupyter notebooks (which in turn are based on the programming language Python) with input widgets. All scripts can be found in the directory tests/ scripts. A component that the user interfaces of the scripts for all three tests have in common is the input menu "Metadata" shown in Figure 2.1.

| Metadata            | Measurement   |  |
|---------------------|---------------|--|
| Name of the tester: | Erwin Example |  |
|                     |               |  |
| Motherboard ID: 33  | 151040100547  |  |
| Motherboard Type:   | 4             |  |
| Motherboard Aging C | vcle: 100     |  |
| Comments: This is   | an example.   |  |
|                     |               |  |

Figure 2.1: Screenshot of the "Metadata" menu that allows the user to store information about the motherboard that was measured.

In the first input field, the user can enter his or her name. Next, the ID of the tested motherboard can be entered. This may be done using a barcode reader, as the ID is encoded with a barcode on all motherboards. The motherboard ID is needed to be able to re-identify the tested motherboard later. In the following two fields, the motherboard type (1 to 4) and the number of aging cycles the motherboard has gone through can be entered. The last input field provides space for possible comments about the measurements. All information that is entered in the "Metadata" menu is later stored in an excel file together with the measurement values. Using excel files to store the data instead of the more common .csv or .txt data types has the advantage that they can also be easily interpreted without using the analysis software. In addition, the possibility to create multiple sheets in one excel files comes in handy to collect different kinds of information conveniently in one place.

#### 2.2 Interconnection Test

This section is dedicated to the discussion of the setup used to test the interconnections of the motherboards. In particular, the test design and the user interface of the measurement software are presented and discussed.

#### 2.2.1 Test Design

The interconnection test setup was built by the particle physics group from the INFN Torino and simply reused in this study. The tested motherboard is placed on the designated holder. Then, the connector cards are plugged in and the motherboard's kaptons are placed in connector bars below the motherboard holder. For motherboards of type 1, the connector card A (the one with the shortest connection ribbon) is replaced with a different card (labeled "Type 1"). As a last step, the cable coming from the power supply is plugged into the motherboard. The whole test setup and the user interface, discussed in the next subsection, are displayed in Figure 2.2.



Figure 2.2: Setup for the interconnection test. On the left, the holder for the motherboard as well as the different ribbons and cables that need to be plugged into the motherboard are displayed. The right picture shows the user interface based on the program LabView.

#### 2.2.2 User Interface

The user interface of the test setup built by the particle physics group from the INFN Torino is displayed on the right in Figure 2.2. It is based on the software

LabView. In order to run the interconnection measurements, the designated Lab-View program (there is a shortcut on the computer's desktop) first needs to be opened. The measurements can then be started by clicking the "Run" button. The software automatically displays the outcome of the measurements using the red and green color code displayed in Figure 2.2. Since the software is old and some of the necessary packages for all functions to run properly are no longer available, the measurement data can only be displayed, but not saved. In addition, the red and green color code might be confusing for an untrained user, since sometimes the color red is used to indicate a successful measurement. Above two issues could be resolved by creating the software tests/scripts/interconnection\_test/Interconnection Test Measurement.ipynb displayed in Figure 2.3.



Save Measurement

Figure 2.3: Interface of the script that can be used to save the outcome of the interconnection test. The colors of the input arrays must simply be matched to the ones in the LabView program (Figure 2.2). In this way, an understanding of the color code used in Figure 2.2 is not necessary to perform the interconnection test.

Besides the "Metadata" menu already discussed above (see Section 2.1), this interface consists of five menus, where the outcome of the interconnection test with the same red and green code used in the LabView program (Figure 2.2) can be entered. The color of each channel can be changed by simply clicking on it. If the colors in Interconnection Test Measurement.ipynb match the ones in the LabView program, the button "Save Measurement" can be clicked and all measurements are stored in an excel file together with the information from the "Metadata" menu.

#### 2.3 Leakage Current Test

In this section, the test setup used to measure the leakage currents is discussed. Again the general test design, the user interface and the way the measurements are taken are addressed.

#### 2.3.1 Test Design

At the core of the leakage current test setup lies Keithley's SourceMeter 2410. It is able to simultaneously apply a DC output voltage and measure the resulting current. The power supply port of the tested motherboard has to be connected to the "Input/Output" port of the sourcemeter. The connection from the sourcemeter to the computer that controls the measurements can be achieved via National Instruments' "GPIB-USB-H" cable. In order for the computer to be able to communicate with the sourcemeter, the installation of National Instruments' driver software "NI-488.2" is necessary. Unfortunately only Windows and some versions of Linux are supported. The test setup is shown in Figure 2.4.



Figure 2.4: Setup for the leakage current test. The measurements are performed by Keithley's 2410 SourceMeter that is able to simultaneously provide a DC output voltage and then measure the resulting leakage current in the motherboard. The measurements are coordinated by the script Leakage Current Test Measurement.ipynb.

#### 2.3.2 User Interface

Once the connection from the motherboard to the sourcemeter and from the sourcemeter to the computer is set up, the script Leakage Current Test Measurement. ipynb (in the directory tests/scripts/leakage\_current\_test) can be used for the measurements to be performed automatically.

| ~  |
|----|
| ~  |
|    |
| 30 |
|    |
|    |
|    |
|    |
|    |

Figure 2.5: Interface of the script that automatically performs the leakage current test. The total number of measurements and the time between two consecutive measurements can be chosen. The advance of the measurements is indicated with the progress bar.

The "Measurement" menu in the user interface of the script Leakage Current Test Measurements.ipynb is shown in Figure 2.5. The first dropdown widget enables the user to choose the instrument that should be used for the measurements.

The name of the sourcemeter starts with GPIB, since it is connected to the computer via the "GPIB-USB-HS" cable. In the second dropdown widget, the name of the subfolder in the tests/data/leakage\_current\_test\_data directory where the measurements should be stored, can be chosen. It is also possible to manually add a new folder to the above directory. In the next input field the total number of leakage current measurements can be chosen that will be performed. The time between the start of the script and the moment the first leakage current measurement is taken can be set with the slider that follows. The shortest time possible is limited to 30 seconds, since it takes a while for the capacitors on the motherboards to reach full charge. The current that is present during this time is not the leakage current this test is about. The time interval between two consecutive measurements after the first one was completed can be chosen with the next input field. The two check boxes allow the user to choose whether each measurement or the end of all measurements should be indicated with a beep sound. This can be convenient to know the status of the measurements, but might get annoying if measurements are taken frequently. The script can be started by clicking on the "Run Measurement" button.

#### 2.3.3 Measurements

The script Leakage Current Test Measurements.ipynb automatically puts the sourcemeter in the settings necessary to perform the measurements (this is done by the function setup). The output of the sourcemeter should be a DC voltage of 450 V, the resulting current in the motherboard should be measured with an upper limit of 100  $\mu$ A and with a filter that takes the moving average over 30 measurements (this reduces the fluctuations in the measurements). In addition, the measurement speed of the sourcemeter has to be set to "HI-accuracy", otherwise the test does not yield sensible data. The range of the current measurement has to be set to 10  $\mu$ A, since the output voltage of 450 V and an input current range of smaller than 10  $\mu$ A were not compatible. In a next step, the script performs the actual measurements (this is done by the function run). After each measurement the script pauses for the time chosen in the user interface using Python's time.sleep function. Afterwards the next measurement is taken. Finally, all measurements are stored in an excel file together with the metadata.

#### 2.4 Capacitor Test

#### 2.4.1 Test Design

The idea of the capacitor test setup is to make use of the low-pass filtering properties of the capacitors to test if they are still working properly. If instead of the DC bias voltage an AC source is used and if the APDs are replaced by a test resistor  $R_x$ , the ratio of the output voltage over  $R_x$  compared to the input AC voltage  $V_{\rm in}$ contains information about the condition of the channel's low-pass filter (and thus the channel's capacitor), if the frequency of the AC voltage is high enough to be measurably suppressed by the low-pass filter. A graphical representation of the test circuit is given in Figure 2.6.

When designing the setup for the capacitor test, a suitable resistor  $R_x$  has to be chosen. It can be found via an impedance calculation (the script that performs these calculations can be found under tests/scripts/capacitor\_test/ impedance\_calculation/Impedance Calculation.ipynb) that the higher the resistance  $R_x$ , the higher the resulting output voltage  $V_{out}$ . This is not surprising, since  $V_{out}$  without the capacitor (C = 0) would simply be given by  $R_x/(2R + R_x)$ ,



Figure 2.6: Circuit of the capacitor test. The APDs are replaced by a test resistor  $R_x$ . The state of the capacitor can be derived from the state of the low-pass filter, which can be tested by analyzing its low frequency suppression.

which is also increasing when  $R_x$  is increasing. To achieve a reasonable difference (e.g. on the order of 1 V) in  $V_{\text{out}}$  between the cases when the low-pass filter is working to when it is not working, a resistivity  $R_x = 82 \,\mathrm{k}\Omega$  is sufficient. Next, the frequency of the input voltage  $V_{\text{in}}$  needs to be chosen such that the output voltage is measurably but not completely suppressed by the low-pass filter. The so-called Bode plot (see Figure 2.7) contains all information needed to make this decision.



Figure 2.7: Bode plot for the low-pass filter on the motherboards. Note that a logarithmic scale was chosen for both axes. While low frequencies are transmitted with almost no losses, higher frequencies are suppressed or completely extinguished.

The -3dB point, where  $V_{\text{out}}$  is suppressed by a factor 2, was chosen, which corresponds to a frequency of roughly 600 Hz. Using  $R_x = 82 \,\mathrm{k}\Omega$  and  $V_{\text{in}} = V_0 \,e^{i2\pi ft}$  with f = 600 Hz and  $V_0 = 10$  V, the resulting output voltage  $V_{\text{out}}$  has a root mean square (RMS) of  $1.3 \pm 0.2$  V when the low-pass filter is working properly (C = 10  $\pm 2 \,\mathrm{nF}$ ) and a RMS of 2.7 V when the capacitor is broken (C = 0). This difference can easily be detected by a voltmeter. In addition, this test setup also allows for the distinction between above two cases and the case when an error occurs in the setup, e.g. if there is a short circuit present. In this case the output voltage is simply 0 V.

The setup for the capacitor test is shown in Figure 2.8. The input AC voltage is delivered by the waveform generator. The kaptons of the tested motherboard are plugged into the five connector cards displayed in Figure 2.8. In order for the ordering of the channels to come out right, it is crucial that the back sides of the kaptons (with the solder joints) are connected to the back sides of the connector cards (also with solder joints). The connector cards in turn are connected with the 40 channel multiplex module 7702 that is inserted in the Keithley DAQ6510 multimeter device. The script tests/scripts/capacitor\_test/Capacitor Test Measurement.ipynb should be running on the computer that is connected to the Keithley device via an USB cable.



Figure 2.8: Setup for the capacitor test. The AC input voltage is provided by the waveform generator. The output voltage over each channel is then measured by the Keithley DAQ6510 multimeter via the 40 channel multiplex module.

#### 2.4.2 User Interface

The interface of the script located at tests/scripts/capacitor\_test/Capacitor Test Measurement.ipynb displayed in Figure 2.9 allows the user to enter the metadata and to choose the parameters for the measurements. The name of the Keithley device with which the measurements should be performed can be chosen in the first input field. The name starts with "USB", since the connection from the Keithley device to the computer is established via USB connection. The next input field allows the user to choose the directory in tests/data/capacitor\_test where the measurement data should be stored. The script can be started by clicking the "Run Measurement" button.

#### 2.4.3 Measurements

The script Capacitor Test Measurement.ipynb performs all measurements automatically and stores them in an excel file in the chosen subfolder in the tests/ data/capacitor\_test\_data directory. In order for the Keithley DAQ6510 device to be able to measure the voltage over the different channels, it has to be in the

| Choose instrument: USB0::0x05E6::0x6510::04383398::INSTR  Choose folder: Sandbox | Metadata           | Measurement                |            |   |  |
|----------------------------------------------------------------------------------|--------------------|----------------------------|------------|---|--|
|                                                                                  | Choose instrument: | USB0::0x05E6::0x6510::0438 | 398::INSTR | ~ |  |
| Dup Macaumanat                                                                   | Choose folder: Sa  | ndbox                      |            | ~ |  |
| Pus Measurement                                                                  |                    |                            |            |   |  |
| Kun Measurement                                                                  | Run Measurement    |                            |            |   |  |

Figure 2.9: Interface of the script that automatically performs the capacitor test. The name of the Keithley device and the subfolder in the directory tests/data/capacitor\_test where the measurements should be stored can be chosen.

"Rear" measurement mode. Since this setting cannot be changed via remote control, the script simply prints an error message informing the user if the device is not in the "Rear" mode. All other settings are changed automatically by the function setup of the script. After this function is terminated successfully, the run function is called that sends the instruction for an AC voltage scan over the first 25 channels to be performed. The device then individually measures the input voltages over these channels and returns the results in an array. The wiring between the channels of the 40 channel multiplex module 7702 of the Keithley multimeter and the connector cards is such that the channels 1, 2, 3, 4, 5, 6, 7, ... of the multiplex module correspond to the channels 1A, 1B, 1C, 1D, 1E, 2E, 2D, ... (see Figure 3.7 for more information about the channel naming) on the motherboard types 2 and 3. If the motherboard type is 1 or 4, the channel assignment turns out to be 1, 2, 3, 4, 5, 6, 7, ... to 5E, 5D, 5C, 5B, 5A, 4A, 4B, ... . The script automatically corrects this permutation if the motherboard type was entered correctly in the "Metadata" menu and the kaptons were plugged in correctly to the connector cards (side with the solder joints to the side with the solder joints). The function save stores all measurement values together with the information entered in the "Metadata" menu to an excel file.

# Chapter 3 Preliminary Test Data

In this chapter, the hitherto available data (after 100 out of 400 thermal aging cycles) acquired using the test setups described in the previous chapter is discussed. This is done by using the analysis software created to conveniently access the measurement data stored in excel files. In addition, the statistical claims about the reliability of the motherboards that result from this preliminary data are presented and discussed.

#### 3.1 Interconnection Test

This section is dedicated to the preliminary results from the interconnection test and the conclusions that can be drawn from them.

A screenshot of the analysis software Interconnection Test Analysis.ipynb (can be found under tests/scripts/interconnection\_test) for the interconnection test is displayed in Figure 3.1. The chosen folder contains the measurement data for the motherboards after 100 thermal aging cycles. The option "Only print fails" was chosen to only output the interconnection measurements that yielded a fail.

| Analysis                              |            |   |  |
|---------------------------------------|------------|---|--|
| Choose subfolder:                     | 100_cycles | ~ |  |
| <ul> <li>Only print fails.</li> </ul> |            |   |  |
| Start Analysis                        |            |   |  |

It seems that there have been no fails in the test.

Figure 3.1: Analysis software for the interconnection test. The data in the chosen subfolder in the tests/data/interconnection\_test\_data directory is printed. It can be chosen whether only the fails or all test results should be displayed.

The result of the interconnection test after 100 thermal aging cycles was that none of the channels failed (as shown in Figure 3.1). Using the formula for the lower bound of the MTTF introduced in Section 1.2, a MTTF of at least 1335 years can be claimed at 95% confidence level (this calculation was performed using the script reliability/chi\_squared/Chi Squared Estimation.ipynb). Although this might sound like a very strong result, it turns out that it is not: given above MTTF, only a maximum channel failure rate of 2.5% after 31 years can be claimed at 95% confidence level, which of course lies above the desired 1% mark. In order to bring the upper bound down to 1%, an additional accumulated test time of roughly 6000 years (at 0 fails) is needed, which corresponds to additional 150 aging cycles for a test sample size of 16 motherboards. This number lies within the total 400 aging cycles that are planned for the motherboards and therefore it will be possible to claim the desired 1% maximum failure rate for the interconnections as soon as all data is available, given that no fails will occur further on.

#### 3.2 Leakage Current Test

In this section, the results obtained in the leakage current test are discussed as well as the statistical statements that follow from these results.

The user interface of the leakage current analysis software tests/scripts/leakage\_ current\_test/Leakage Current Test Analysis.py is shown in Figure 3.2. The folder "100\_cycles" was chosen to show the data after 100 thermal aging cycles. If the option "Plot information about the means" is selected, a diagram showing the means for the different groups of motherboards found in the selected subfolder, as well as a so-called ANOVA analysis are displayed. The last two input field concern the histogram plots for the leakage current measurements. The time for the histograms can be chosen in the first field. The second field allows the user to choose the range of the histogram bins. The histograms shown below (Figure 3.5) were generated for a histogram time of 10 min and the range from 0 to 40 nA.

| Analysis             | 1               |        |
|----------------------|-----------------|--------|
| Choose subfolder:    | 100_cycles      | ~      |
| Plot information     | about the means |        |
| Histogram time [min] | 10              | ٢      |
| Histogram range [nA] |                 | 0 - 40 |
| Start Analysis       |                 |        |

Figure 3.2: Analysis software for the leakage current test. It can be chosen whether a plot of the leakage current means should be generated. In addition, the parameters for the histogram plots can be selected.

The first plot generated by the script Leakage Current Test Analysis.ipynb is shown in Figure 3.3. It gives an overview over all measurements that were taken. The x-axis shows the time of the measurements, on the y-axis the measured leakage currents are given.

The name "100a" stands for the group of motherboards that were aged at constant temperature before the 100 thermal aging cycles, while the group "100" has only gone through the 100 cycles. The fact that the leakage current is decreasing over time can be explained with the fact that charges on the surface of the motherboard are pushed out over time. This causes a small current to be formed that decreases over time due to the increasing lack of movable charge carriers. All measurements show a "kink" after roughly 15 minutes. The existence of such a point of discontinuity seems to imply that there are several processes going on that have an influence on the leakage current, one of which ends after 15 minutes. In a leakage current measurement taken over a longer time span (during 12 hours), the leakage current present in the motherboards was still found to be decreasing after the "kink", however at a much slower rate.



**Figure 3.3:** Plot that shows an overview over all leakage current measurements. While both groups "100" and "100a" were aged during 100 thermal cycles, group "100a" was additionally aged at a constant temperature beforehand.

The exact underlying mechanisms that govern the course of the leakage current curves are unclear and might be worthwhile to be studied in further investigations. The plot with the means and the ANOVA analysis is displayed in Figure 3.4. Again the two groups "100a" and "100" are displayed. The upper plot shows the means and the standard deviations for these groups. The ANOVA plot below contains information about the statistical significance of the difference between the means. In the ANOVA analysis, it is assumed that all measurements are statistically identical (there is no difference between the groups "100a" and "100"). Based on this assumption, the probability that the difference between the means of the two groups present in the upper plot is obtained coincidentally is calculated. The fact that this probability is always around 40% in the ANOVA plot shows that the difference between the means in the upper plot is not statistically significant at all.



Figure 3.4: The upper diagram shows the means and the standard deviations for the two groups "100" and "100a". In the lower diagram, the ANOVA probability is displayed, which gives a measure of the significance of the differences between the means in the upper diagram.



The last plot that is generated by the script shows the histograms with the parameters chosen in the Jupyter widget. It is given in Figure 3.5.

Figure 3.5: Histogram plot for the leakage current measurements.

While the group "100a" is narrowly contained, the histogram plot for the group "100" shows two outliers on either side.

To conclude the measurements discussed in details above, the leakage currents for all motherboards were very low (on the order of 10 nA) and were decreasing over time. No short circuits were encountered. In addition, no statistically significant difference between the groups "100" and "100a" concerning the leakage currents was found.

#### 3.3 Capacitor Test

This section is dedicated to the results that were obtained in the capacitor test and the resulting statistical statements that can be backed by them.

The results for the capacitor test are displayed in Figure 3.6 in the way they are presented by the script tests/scripts/capacitor\_test/Capacitor Test Analysis. ipynb. Again, the data after 100 thermal aging cycles is displayed (the corresponding folder was selected in the dropdown menu). The option "Only print fails" was chosen, since in the following discussion only the fails are relevant. The checkbox with the label "The capacities were mesured automatically" allows the user to choose whether the measurement values should be interpreted as capacities in nanofarads, resulting from the manual measurement, or in volts, resulting from the automatic test setup for the capacitor test described in the previous chapter.

The capacitor test yielded 10 fails after 100 aging cycles. Given this result, the upper and lower limit for the MTTF at 95% confidence level are 235 and 737 years respectively. For a MTTF of 235 years, the percentage of fails that are expected after 31 years is 12.4%. The upper bound for the fails after 31 years is 13.5%, also at 95% confidence level. These results are bad news concerning the reliability of the capacitors on the motherboards. However, there is a detail that calls for further investigation. All fails listed in Figure 3.6 either occurred in row A or E (the row is indicated in the name of the capacitor). The naming of the capacitors on the motherboards is shown in Figure 3.7.

| Analysis          |                  |                   |             |             |          |         |          |      |  |  |
|-------------------|------------------|-------------------|-------------|-------------|----------|---------|----------|------|--|--|
| Choose subfolder: | 100_cycles       |                   |             | ~           |          |         |          |      |  |  |
| Only print fails  | 3.               |                   |             |             |          |         |          |      |  |  |
| Capacities we     | re measured auto | matically (via AC | voltage).   |             |          |         |          |      |  |  |
| Start Analysis    |                  |                   |             |             |          |         |          |      |  |  |
|                   |                  |                   |             |             |          |         |          |      |  |  |
| Motherboard 3315  | 51040100547,     | Aging Cycle       | 100, Type 4 | , Capacitor | 3A was a | a FAIL  | 2.467336 | V).  |  |  |
| Motherboard 3315  | 51040100547,     | Aging Cycle       | 100, Type 4 | , Capacitor | 3E was a | a FAIL  | 2.465456 | V).  |  |  |
| Motherhoard 3319  | 1030100529       | Aging Cucle       | 100 7000 3  | Capacitor   | 53 wag   | P FATT. | 1 675296 | 37.) |  |  |

| Motherboard | 0701912323, Agin | ng Cycle 100, | , Type 3, Cap | acitor 5A was  | a FAIL  | (1.52  | 7583 V).   |     |
|-------------|------------------|---------------|---------------|----------------|---------|--------|------------|-----|
| Motherboard | 33151030100627,  | Aging Cycle   | 100a, Type 3  | , Capacitor 21 | E was a | FAIL   | (2.466646  | V). |
| Motherboard | 33151030100627,  | Aging Cycle   | 100a, Type 3  | , Capacitor 5  | A was a | FAIL   | (2.438051  | V). |
| Motherboard | 33151030100617,  | Aging Cycle   | 100, Type 3,  | Capacitor 5E   | was a 1 | FAIL ( | 1.841706 V | ).  |
| Motherboard | 33151020100598,  | Aging Cycle   | 100a, Type 2  | , Capacitor 4  | A was a | FAIL   | (2.46108 V | ).  |
| Motherboard | 33151010100601,  | Aging Cycle   | 100a, Type 1  | , Capacitor 4  | A was a | FAIL   | (2.230835  | V). |
| Motherboard | 33151040100475,  | Aging Cycle   | 100a, Type 4  | , Capacitor 4  | A was a | FAIL   | (2.463731  | V). |

Figure 3.6: Analysis software for the capacitor test. Again it can be chosen whether only the fails or all measurements should be printed. In addition, the option "Capacities were measured automatically" allows for the distinction between the data that was measured manually (in nanofarads) and the one that was acquired with the automatic test setup (in volts).



Figure 3.7: Naming of the channels on the motherboard. The ordering of the names winds down the motherboard's plug bars like a snake.

Figure 3.7 clearly shows that all fails listed in Figure 3.6 occurred on the periphery of the motherboard, but not in the center. This cannot be coincidental: assuming a uniform distribution of the fails, the probability for all of them to occur in rows A or E is only 0.01%. Thus, there must be an underlying mechanism that causes the capacitors on the periphery to fail earlier than the ones in the center of the board. One possible explanation for this could be that the force that had to be exerted on the plug bars of the motherboards while performing the interconnection test caused the capacitors on the periphery to break due to the mechanical stress. The validity of this explanation was examined in a "plug/unplug" test: all connector cards of the interconnection test setup were plugged into the connection bars of a new motherboard with working capacitors and then unplugged again. This procedure was repeated 40 times and then the state of the capacitors was tested again. The result of this test was that none of the 25 capacitors on the motherboard failed. This is enough data to claim that no more than 6 fails should have occurred in the capacitor test at 95% confidence level. The fact that 10 fails occurred proves that the mechanical force during the interconnection test cannot be the (only) mechanism that caused the capacitors on the periphery to fail earlier. Another possible explanation for the non-uniform distribution of the fails in the capacitor test could

be that the thermal expansion during the thermal cycles was more strongly pronounced in one direction than in the other. This would explain why the capacitors on the side that are aligned in a different direction than most of the other capacitors on the board were more likely to break. However, since there are some capacitors in column 1 and rows B, C and D that are aligned in the same direction as the ones in rows A and E, none of which failed the capacitor test, this explanation also does not seem very likely. Further investigation involving the temperature distribution on the motherboards during the thermal aging cycles might be worthwhile to be able to find the underlying mechanism that causes the non-uniform failure distribution in the capacitor test.

### Chapter 4

# Conclusions

In this chapter, the key findings and achievements of this study are summarized. The implications of the preliminary test data are outlined and ideas about possible further research are given.

The total time needed to measure a sample group consisting of 16 motherboards could be reduced by automating the test setups. Furthermore, the quality of the acquired test data was improved. A summary over the improvements concerning the time needed to perform the different tests is given below.

| Test                         | Before  | After      | Remarks                                                                                                                                                                                                                                                    |
|------------------------------|---------|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Inter-<br>connection<br>Test | -       | 3 hours    | This setup was already automated by the particle physics group from the INFN Torino.                                                                                                                                                                       |
| Leakage<br>Current Test      | 2 days  | 1 day      | Although the overall measurement time was<br>only decreased by a factor 2, the automated<br>measurements take much less effort, since<br>each motherboard only needs to be plugged<br>in an all measurement are then performed<br>completely autonomously. |
| Capacitor Test               | 3 hours | 30 minutes | In the automated test setup, the issue that<br>broken capacitors sometimes still seemed to<br>work when measured manually (due to the<br>applied force when measuring) is completely<br>resolved.                                                          |

In addition, since the test setups are now automated, several tests may be performed simultaneously by a single user. In this fashion, all three tests can be performed on 16 motherboards in a single day. The preliminary test results that were found after 100 out of the total 400 aging cycles are displayed in the table below.

It will be possible to claim the upper failure bound of 1% after 31 years for the interconnections once the data after 400 aging cycles is available, assuming that no fails will occur further on. It is not possible to reasonably acquire enough data to be able to claim the desired maximum 3/2448 failure rate for the leakage current based on the thermal aging cycles (it would take roughly 7 years of constant cycling). However, roughly 30'000 years of accumulated test data is available from the motherboards inside the detector. If this data is accessed, the range of the fails that are to be expected after 31 years can be significantly narrowed down. The fact that the failure rate in the capacitor test after 100 aging cycles (corresponding to 10 years) of 2.5% was already higher than the desired maximum failure rate of 1% after 31 years is bad news. However, since all fails occurred on the sides of the motherboards and since this effect is statistically significant (the probability for this

| Test                         | Fails | Upper bound for the<br>fails after 31 years<br>(at 95% CL) | Remarks                                                                                                                                                    |
|------------------------------|-------|------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Inter-<br>connection<br>Test | 0/400 | 2.5%                                                       | Slightly more test data (additional 150 aging cycles) at 0 fails is needed to be able to claim the desired upper failure bound of 1% after 31 years.       |
| Leakage<br>Current Test      | 0/16  | 59%                                                        | A lot more test data (additional 175'000 aging cycles) at 0 fails is needed to be able to claim the desired maximum failure rate of 3/2448 after 31 years. |
| Capacitor Test 10/400        |       | 13.5%                                                      | All capacitors that failed were either in row A or E (on the outside).                                                                                     |

is only 0.01% assuming an uniform distribution of the failures), there must be an underlying mechanism that causes the capacitors on the sides to fail earlier. It was shown in this study that the mechanical force exerted on the motherboards during the interconnection test to plug in the connector cards cannot be (the only) reason for the non-uniform distribution of the failures.

In conclusion, as far as the preliminary test results are concerned, it seems that the motherboards are reliable regarding their function to conduct the bias voltage and the signal current. However, more data is needed to be able to claim this statement with statistical certainty. The reliability of the low-pass filter on the motherboards poses a problem. The failure rate of the capacitors was far to high during the preliminary tests for the maximum failure rate after 31 years to be within 1%. However, since all of the capacitors failed on the sides of the motherboard, there must be an underlying wear-out mechanism that causes a non-uniform distribution of the fails. It might be worthwhile to perform further research to be able to identify this mechanism and to find out if it also takes place inside the detector. In addition, further investigation could also be focused on the effect malfunctioning low-pass filters have on ECAL's resolution. If these two unknowns are controlled, a final decision concerning the reliability of the motherboards during the HL-LHC run can be taken.

# Acknowledgements

First of all I would like to thank my supervisor Dr. Werner Lustermann for all the detailed explanations, the continuing support and the interesting talks. I always encountered an open door to discuss issues that occurred or ideas for further improvements. I would also like to express my gratitude towards Thomas Reitenspiess for all his help (particularly also for pointing out missing factors  $2\pi$  in my calculations) and Mikiko Ito for her interesting talk about silicon photo multipliers. Furthermore, I would like to thank Christian Haller and Michael Dröge for all their practical support, without which I could not have completed this study to this extent. A special acknowledgement also deserves Gabriele Kogler for organizing my stay at CERN, including two fascinating tours to the ATLAS detector and the GBAR experiment. I would also like to thank Pier Paolo Trapani for providing and explaining the automated setup for the interconnection test. Last, but certainly not least, I would like to thank Prof. Günther Dissertori for giving me the opportunity to carry out this Semester project in one of his groups at CERN and for taking the time to attend all final presentations.

Besides the above-mentioned personal acknowledgements I would also like to express my gratitude towards the Institute for Particle Physics and Astrophysics at ETH Zürich for bearing the costs for my stay in Geneva and the CMS collaboration at CERN for being a wonderful host.

# Bibliography

- S. Mersi, "Phase-2 upgrade of the cms tracker." Nuclear and Particle Physics Proceedings, vol. 273-275, pp. 1034 – 1041, 2016, 37th International Conference on High Energy Physics (ICHEP).
- [2] CMS, "The Phase-2 Upgrade of the CMS Barrel Calorimeters Technical Design Report." no. CERN-LHCC-2017-011. CMS-TDR-015, Sep 2017.
- [3] T.-M. I. Băjenescu and M. I. Bâzu, "Component Reliability for Electronic Systems." 2010.
- [4] S. Morris, "Component reliability for electronic systems." reliabilityanalyticstoolkit.appspot.com/confidence\_limits\_exponential\_ distribution, accessed: 26.6.2018.
- [5] Weibull, "Chi-squared distribution and reliability demonstration test design." weibull.com/hotwire/issue116/relbasics116.htm, accessed: 26.6.2018.

#### **Declaration of Originality**

I hereby declare that the written work I have submitted entitled

# Design of an Automated Aging Analysis for the Motherboards in the CMS ECAL

is original work which I alone have authored and which is written in my own words.<sup>1</sup>

#### Author(s)

Luc

Schnell

#### Supervising lecturer

Werner

Lustermann

With the signature I declare that I have been informed regarding normal academic citation rules and that I have read and understood the information on 'Citation etiquette' (https://www.ethz.ch/content/dam/ethz/main/education/rechtliches-abschluesse/ leistungskontrollen/plagiarism-citationetiquette.pdf). The citation conventions usual to the discipline in question here have been respected.

The above written work may be tested electronically for plagiarism.

Zürich, 14.7.2018 Place and date

2. Echnell

Signature

 $<sup>^{1}</sup>$ Co-authored work: The signatures of all authors are required. Each signature attests to the originality of the entire piece of written work in its final form.