NDTREV

INSIGHT Vol. 37 No 1 36-42 January 1995

Can we train a computer to be a skilled inspector?

Colin G Windsor

The experience and knowledge of its staff are the backbone of any NDT company. New procedures particularly those based on neural networks, seek to encapsulate this experience and allow fully automated inspection to be considered. The change from manual inspection at least removes the factors of fatigue, boredom, inattention, variability, and subjectivity from the inspections. At best it provides a system with defined. reproducible Quality Assured standards. This review asks to what extent the new methods might achieve these aims.

1. The Nature of Experience

It is often said that the most important asset of a business is its staff. A nondestructive testing and inspection business is an excellent example of this. Everyone in such an organisation knows those white haired, white coated individuals who have "seen it all". Asked to examine a weld with an ultrasonic probe, they pass the probe over the metal and, when most of us see an ever-changing pattern of peaks, they announce, "Its OK, its only a small slag inclusion." What is the nature of the procedure being followed to make the subsequent judgement? Can its component parts be formalised and reproduced in an automatic system?

The Apprentice System

We have had for many years a quite formal system for passing on knowledge of this kind. A well trained apprentice had the function of reproducing the actions, skills and judgements of his master. He watched the master at work with his ultrasonic probe. He watched how he moved it, and listened to his master's interpretation of what he saw. It was a dynamic iterative process.

i) The master performs the series of scans making up an inspection.

ii) The apprentice tries a similar series of scans for himself.

iii) The master looks at his actions and his final conclusion, and points out the nature of any discrepancy between their actions and conclusions.

iv) Go back to (i).

Rome was not built in a day! The old apprentices would be indentured for seven years. We shall see later that neural network training works using training by example in a similar iterative way, and is similarly time consuming, although not is the seven year category! However after the long training period the apprentice was as quick as his master, and capable of the same rapid decisions. Trained networks are similarly ready for instant action. They remember the answer rather than work it out.

The Expert Approach

There is an alternative to training by example. It is the expert system approach of information technology. Forget the apprentice, hire instead a Phd complete with six or seven years of general purpose training. His technique is quite different. First he studies the problem until he understands the relevant facets of the physics which underlie the process. In the case of the ultrasonic scan, he would need to understand the performance of his transducer, the nature of the ultrasonic waves it produces, and the nature of the reflection process between the ultrasound and the workpiece with its included defects. He then uses his algebra, his computer, or even his supercomputer, to evaluate the signals he might see from various defects of interest. When he looks at the same ultrasonic scan over the weld with the same "random-looking structure", he sees it with different eyes. He understands what he sees. He refers to his notebooks and is able to deduce rules which can then be used to make a judgement of the nature and importance of the defect. It can be very cost effective, for only one Phd may be needed! Given that appropriate rules exist, and that they have been successfully incorporated into a logical decision tree which extends from the data to the required decision, then anyone, even a computer, can apply them. In the case of defects in welds, some such rules exist, but they are rarely definite, but rather probabilistic or "fuzzy". This is not in itself a hindrance to their implementation, though of course they reduce the ease and reliability of their application. For example cracks lying parallel to the interface between the weld and the workpiece are generally lack of fusion, or smooth cracks, cracks near the root of the weld, or in the heat affected zone are generally rough cracks. Signals from the centre of the weld are often from porosity. It will often be possible to define a set of probabilities for each defect type, according to its position and orientation with respect to the weld geometry.

So what are the key features?

If only it were so easy! Sadly in a case such as the classification of defects in welds, there is a serious problem to the expert model approach. It arises from that essentially human skill of being able to generalise from relatively few examples. Every toddler has it. We give him a daisy and say "flower", a buttercup and say "flower", and he can recognise a bluebell as a flower!" Those problems of translational, rotational and size variance which occupy computers, are assimilated with ease. Defects in welds are equally difficult to classify for every example we see may be quite different in its shape and form. We cannot model every possible shape of defect!

An expert approach may still be applied but it must be a different approach through the extraction of "features". Some features of flowers are that they have a flexible stalk, some green leaves and some bright coloured petals. Daisies, buttercups and bluebells have these three features, and if we need to exclude runner beans then all we need is another feature such as length. In the case of defects in welds much expert thought has gone into the selection of suitable features which will distinguish the important types of defect. In essence this approach is straightforward:

i) Use expert human skills to decide the features which best distinguish the defects we need to classify.

ii) Provide a training set of ultrasonic images of defects of each class. These can be images of real defects or of artificial defects simulated on a computer.

iii) Evaluate the features for each training defect image. For a problem with appropriate distinguishing features the points from each class will lie in well defined clusters within a "feature space" whose axes are each of the features.

iv) Record a test image from a new defect of unknown class and see into which cluster the new point best lies.

Figure 1. A so-called "scatter plot" of examples of weld defects plotted in a two-dimensional plot according to the values of two feature parameters. The y-axis (AM) represents the amplitude ratio between the signals at two scattering angles, the x-axis(KU) the kurtosis or shape of the reflected waveform. The different symbols show the class of each example, the large crosses show the centre of each class. The triangle illustrates a test example.

The problem with this method is the difficulty and sometimes the impossibility of finding suitable features which distinguish the classes. In the case of defects in welds, the rough cracks and porosity can appear extremely similar and certainly no one feature can distinguish them. It is then that a multi-variable classifier is needed which can, like humans, take many factors into account at the same time. Each example may then be considered as a point in a multidimensional space, where each axis corresponds to one feature variable. If the classifier is to work then the different classes of defect must appear as more or less discrete clusters in this space. Figure 1 shows such a plot for defects in welds (1) for just two feature parameters. It is seen that each of the four classes of defects considered has a tendency to cluster, but that there are areas of overlap, certainly in this two-dimensional plot. To use such a plot for inspection tasks, the features of some possible defect need to be evaluated so giving a new point in the feature space plot, for example at the position given by the triangle in figure 1. A classifier works by evaluating to which of the possible classes the new point is most likely to belong. Many statistical classifiers are now available which will give similar answers despite the complex shape of the clusters. A good one, the nearest neighbour method, simply chooses the class of the training example which lies closest to the test point. Feature-based methods of this sort work well although the evaluation of the feature parameters limits their speed. However, with improvements in hardware, they may soon be available for on-line use.

Neural networks

The alternative approach, more closely aligned with manual interpretation, is to take in the whole image, perhaps display it in the way best appreciated by the eye, and compare it with a set of training images collected in the past from defect examples of known class. The use of semi-automated data collection procedures has made the collection of such training datasets a practical proposition, although it is still a major task to collect perhaps one hundred good quality images. The number of images required for the training can be appreciated if one imagines oneself having to make the classification decision from these images alone. There must be enough examples of each class to show a spread of those aspects of the image, such as position, which are irrelevant to the class, and to show differences between aspects such as shape which distinguish between classes. The human brain is quite good at this task, and given time to browse through the training images, will soon be able to classify the easier test images correctly. More difficult cases will have the tester consulting the training examples more carefully and trying to assimilate all the information they contain. Neural networks attempt to reproduce this process. Given a set of training images, they go through a learning phase of trying to assimilate the examples, so that when presented with a test image they are able to give a classification.

Figure 2. Schematic diagrams of a single neuron and its connections with other neurons, as it is (left), and as it is represented mathematically (right). The dendrites of the i th neuron accept signals aj from the axons of other neurons j. The synapses modify the incoming signal in a way modelled by a weight wij. The neuron is sensitive to the modified signals from all its dendrites and in the model sends an output pulse down its axon with a probability depending on the sigmoidal function f(x) of the weighted sum less a threshold wi0.
2. The Neural Network Approach

The human brain has a quite different structure from conventional computers and is programmed in a quite different way. Neural networks attempt to simulate this structure and programming method on conventional computers. Central to the approach is to use a network of independent units - the "neurons", which can switch their activity according to the result of summing inputs from other units as modified by adjustable weights representing the strengths of biological "synapses".

Figure 2 shows a single neuron and its mathematical model as explained in the caption. The truly biological feature of neural network models is that they involve appreciable numbers of neurons, and update their outputs in parallel according to the current values of their inputs and weights. However present day neural network approaches leave biology far behind! In particular the brains complex three-dimensional connection structure tends to be simplified to a few simple fully connected layers. The 1010 neurons in the human brain tend to be replaced by a few hundred. The complicated time-dependent mechanisms by which synapse strengths are changed tends to be replaced by a few simple rules. However there is one plus feature of artificial networks. The human brain works by chemical transmitters and its basic "clock cycle", the time for a neuron to switch its state is perhaps 25 milliseconds. In silicon, the basic cycle time can be 25 nanoseconds implying a 106 speed increase. Usually this is not achieved however, for the brain's neurons do their work in parallel while PC's generally have one processor and turn the network simulation into a serial computation. But this method of working is not intrinsic to artificial neural networks, and hardware networks where each neuron works independently are built and used. "Parallel Distributed Processing" is a commonly used name for the subject of neural networks.(2)

Figure 3. A simplified diagram of some of the connections between neurons in the cortex (left), compared with the connections assumed in a classic neural network model - the 3 layer perceptron used in the back propagation method (right). In this much used model the successive layers of neurons are fully connected together but there are no connections between neurons within the same layer or across layers. Inputs to the top layer undergo a non-linear mapping by the hidden units to produce output signals that reproduce the targets set during training.

Figure 3 illustrates the classic three layer network or Multi-Layer Perceptron (MLP) which dominates so much of neural network research today. Inputs from sensors, which can be of diverse kinds, are presented to the input layer. The hidden units in the second layer sample all the inputs, weight them according to the "synapse strengths", and switch on or off according to the value of the weighted sum, less the threshold. They in turn send signals to the output units. The back propagation method provides a formalism for changing all the weights and thresholds in the network so that the target values for each class are achieved for each output unit. For example in defect classification to four classes, four targets may be set with the following values

Class	Output 1	Output 2	Output 3	Output 4
Porosity	1	0	0	0
Slag	0	1	0	0
Rough Crack	0	0	1	0
Smooth Crack	0	0	0	1

Training by example applied to a neural network

Like the apprentice, the neural net has to operate in two modes. In the training mode the network is flexible - its weights are changed iteratively until optimal assimilation of the data has been achieved. Later in the operational mode the network weights are fixed. A trained network, like a fully trained employee, must be thought of as a precious resource. In the case of ultrasonics, the data which it has encapsulated may consist of hundreds of scans over defects taken laboriously over many months or years. The defect samples must be such that the class of the defect is known, either by introducing the defect artificially, or by destructive examination after completion of the test. The data itself is precious, but if the network has been well trained it is yet more precious. A well trained network is able to reproduce the classes of most of the training defects, but will also perform almost equally well in identifying the classes of test defects to which it has never been exposed. Indeed the test performance may be appropriately compared to the test performance of the apprentice when he takes his certification examination. The well trained network has an added value, comparable to the added value of an employee whose training has been certified.

An important part of the training process is the selection and presentation of the material to be presented for learning. Every training course supervisor knows the problems that occur. Too little material and the student will not be equipped to solve the day to day problems that occur. Too much material and the student is swamped with irrelevant detail that will not be needed, and even worse, confuses the student who has no way of knowing which of the many details presented to him is most important. In the same way the data presented to a neural network must be presented so that its information content is appropriate to the classification task being undertaken.

Figure 4. An illustration of the analogy between the training of a neural network and least squares fitting of experimental points to a polynomial function with a particular number of adjustable parameters depending on its order: (i) too small a number of parameters and it is unable to fit the points. (ii) the right number and it follows the trend but not the detail (iii) too many and it overfits, following the detail but missing the trend.

There is a close analogy here with least squares fitting of a set of n data points to a polynomial of m

terms. The 20 points in each of the sections of figure 4 illustrate this point. Each case is a polynomial fit of m terms, y=a0 + a1x + a2x2 + a3x3 + a4x4 . . . + am-1xm-1. Case (i) shows a fit to a polynomial of four terms only, y=a0 + a1x + a2x2 + a3x3 . The key information is in the error bars. It is seen that the complexity of this polynomial is insufficient to reproduce the functional variation in the data. The data is underfitted. Case (iii) represents a polynomial with 16 variables. It reproduces the detail in the data with true fidelity, but as the error bars reveal, this level of detail is not reproducible and would not be followed by an extra data points. The data is overfitted. Case (ii) represents the middle way, with in this case a polynomial of 6 points. The functional form of the data is appropriately described. There is no need to rely on human judgement to find the most appropriate number of adjustable parameters. The correct statistical methodology is to divide the data into portions to be used for training and testing. The training data are fitted to polynomials of varying degree, and the test data residual, the error-weighted least squares difference between the test data and the fitted polynomial, evaluated in each case. The number of adjustable parameters giving the lowest test residual is the optimum.

Exactly the same procedure may be followed for MLP neural networks. The available data is divided into training and test datasets. The test performance, defined as the residual differences between the trained network targets and the outputs summed over the examples in the test dataset, is then evaluated as a function of the network parameters. This will include particularly the number of hidden units in a three-layer network, which is the most straightforward way to adjust the number of variables in the network. However it may also include elements of the preprocessing of the data, for example the degree of averaging in the case of ultrasonic data, which by changing the number of input variables to the network also affect the total number of adjustable parameters.

Figure 5. Ultrasonic images of four example weld defects of different classes. Each image has been processed to show the plan(upper left), side elevation (upper right) and front elevation (lower) of the defect.

As an example figure 5 shows a set of ultrasonic images each of defects of a different class, composed of many hundred thousand pixels. They show sets of B scans along a weld, taken at two different ultrasound angles. The required information for classification is there, and is readily appreciated by the eye, which is skilled at averaging data taken in at a glance. However when presented to a neural network the wealth of detail in the pixels, most of which is irrelevant noise, swamps the network which has to be large to cope with the detail, and has then poor performance in classifying test data because it "overlearns" the detail, and fails to generalise. Figure 6 shows the same data averaged in groups to give just 686 pixels. Of course some resolution is lost, but the information content is more concentrated and the network is able to perform better. It is this more compressed information which the neural network is able to work on directly.

Figure 6. The images of figure 5 averaged and transformed to represent the intensity at the two angles as a function of depth (y), distance across the weld (x) and distance along the weld (y). Although in principle much information has been lost, sufficient remains to assess the defect class, and is indeed more easy to interpret quantitatively.

Table 2 : Percentage success in defect classification from various classifiers

Basis	Type	Method	%success
Feature based	Conventional	3-nearest neighbour	94.4
	Conventional	Weighted minimum distance	91.6
	Neural Network	Multi-Layer Perceptron	94.0
Image Receptive field	Conventional	Adaptive receptive field	93.9
	Neural Network	Multi-Layer Perceptron	93.9
	Neural Network	Shared Weights	90.0
Direct image based	Conventional	Template matching	86.4
	Neural Network	Multi-Layer Perceptron	81.8

Ultrasonic qualification centres rightly make a firm distinction between the defect samples and equipment used in the training sessions and those used for examinations. It would not do for examination candidates to find that the test defect was one which they had already come across. In the same way it is an essential part of neural network methodology to separate the available defect data into a training dataset and a test dataset. The neural net parameters, particularly its size, need to be adjusted so that the test performance is optimised. Indeed very often the test dataset is itself divided up into a validation dataset which is used for this purpose, so that the test dataset results need never have been presented to the network until the final testing phase at the end of the network development process.

3. The Search for a perfect classifier

Table 2 above shows some results for the degree of success in our chosen problem of classification of defects in welds(3). The results are expressed as the percentage of the 67 or so defects whose class, as defined by the options in table 1, were correctly predicted. It is a statistical detail that these are evaluated using the "leave one out" method. Training is performed many times, with each example in turn being selected for testing while all the remainder are used for training. It is the method closest to the "in service" situation where all the available examples could be used for training, and any test cases would necessarily be new examples. The table covers a series of classifiers, both conventional and neural network, and based on both extracted features, and on direct images. Many of these, even the conventional ones, use methods borrowed from human vision. For example the adaptive receptive field method sweeps a small adaptive template over the field of view, just as our eyes sweep over an image. Templates for each class are iteratively adjusted by finding a portion of the image where the fit is best and updating the template from that portion of the image.(3) The striking result from the table is that the best methods of each type give such similar results in terms of classification performance. For example the state of the art conventional method of feature extraction, as developed by Bealing and Burch(4), with a 3-nearest neighbour conventional classifier gives 94.4% accuracy. However a neural network MLP classifier using the same data gives 94.0%. Using image based methods, the adaptive receptive field gives 93.9% while the receptive field MLP using the same images gives 93.9%. Of course those quoted are best results obtained after a research programme to optimise the methods and parameters for this problem.

The conclusion is therefore that appropriately chosen classifiers, working on the same data, give similar results despite major differences in their method of operation. This conclusion was also found from an analysis of model datasets in which several neural network and conventional classification methods were compared as part of the ESPRIT programme ANNIE on the Applications of Neural Networks to Industry in Europe.(5) An analysis of the failures shows that the same classification errors turn up in each case. Although not included here as a classifier, the human eye familiar with the images in the database, also tended to make the same errors. Indeed the human, lacking the objectivity of the other classifiers, was often tempted to question the official classification!

It is in fact a heartening conclusion for the plant operator, for it means that there exists a choice of classifiers that will operate satisfactorily within the limits of the data they are working with. Computer classification, properly set up, can be relied upon by managers, just as they have long relied on computers to add up their profits. There is no need to fund any large research programme to find some better classifier. Instead the need is to assess the performance they need, and whether the data presently being collected is up to the task expected of it.

Of course a good way to increase the level of confidence that the plant operator has in the classification is to include the results of several different methods applied simultaneously to the same problem. Thus the feature-based method coupled with a good traditional classifier, and the neural net classifier applied directly to the data, represent two distinct approaches which together can contribute more than either alone. Yet another approach which any human uses is to consider the position and orientation where the defect has been found as a determining factor in giving the defect class. Thus smooth lack of fusion cracks usually lie parallel to the weld interface, rough cracks near the root of the weld or in the heat-affected zone, and slag and porosity within the bulk of the weld metal. Figure 7 shows a image of a weld defect with the weld geometry superimposed. The reflected ultrasound intensity is shown as a grey scale image within the parallelogram area of the scan. The bright areas showing the largest signals, indicate a defect at the root of the weld. The other bright area can be identified as a spurious reflection from the edge of the head of the weld. The most robust classifier will take into account such information, together with the extracted features and direct image data.

Figure 7. An ultrasonic single B-scan image of a double-V weld superimposed on the known geometry of the weld. The area of varying colours indicate the area covered by the scan. The white areas indicate a large reflected amplitude. The scan suggests a reflection from a defect at the root of the weld and a spurious reflection coming from the edge of the weld head.
4. Classification Speed

It is in this area that the differences between classifiers becomes apparent. Many classical classifiers are relatively slow since in working, they must examine each training example. It is hear that neural network methods come into their own, as for them the hard computation comes in the training process. The general properties of the dataset are assimilated into the trained network, and its details are no longer needed. Very few operations are needed to classify from a trained network. Like humans, neural networks do not "work out" the correct classification, but "remember" it. Thus a 25 input, 4 hidden unit, 4 output net such as might be used for defect classification, can perform a classification with 116 weight multiplications. This represent a sub-millisecond computation time for Personal Computers. If this is not fast enough, then, as in our brains, parallel computation may be brought into play, with the computations associated with each hidden unit being performed simultaneously by independent processors.

The opportunity given by high classification speed lies in the placing of the classification process within the whole inspection process. Traditionally inspections made using human operators with a hand-held probe naturally performed their defect detection and classification "on-line". This method brought with it a host of advantages. If a hint of a defect was seen the inspector could mark it with his crayon, and go over the area in more detail viewing it from different orientations and with different angle probes. When "progress" , "improved equipment" and "Quality Assurance" came in, the inspector was increasingly replaced by a robot scanner which collected and archived the scans. Their role during the inspection was principally to see that the data was being collected properly. The analysis of the data then switched to an "off-line" environment which superficially had many advantages. The inspection of the scans could be made at leisure in an air-conditioned room with advanced computer hardware and software enabling the best assistance to the human operator to be given. It could be performed after the completion of the data collection, so that the pressure for a quick analysis was over. However a set of new problems arose. The correlation of the scan being inspected with its actual position on the plant became an important issue. There was no longer much chance of repeating scans over doubtful areas.

It is here that the accuracy and speed of automatic classification become crucial factors. If automatic on-line inspection were possible the advantages of both systems may be enjoyed. The data could be collected and stored for reference, yet if any problem area were found it could immediately be noted and the full range of human skills be brought to bear in order to extract as much information as possible.

5. The Problems of Human Perception

So far the automated classifier has been considered as an aid to a final human decision, and few today would question that position in the case of NDT. However the future may well be quite different. There is an increasing awareness of the limitations of human perception, and an appreciation of how far human performance may differ from the Quality Assurance procedures we should like to apply. There are several areas of concern.

Perhaps the most important lies in the nature of human perception. We do not generally perform inspections by moving our eyes along a careful raster scan which can be guarantied to cover all the inspection area. Studies in which the movements of a subject's eyes have been recorded as a scene is examined show that the centre of our vision, the fovea, darts around the scene in a random way. Working on several spatial scales at once, it is always able to concentrate on any region where a desired feature may be present. This gives us rapid feature recognition capability, but the hidden cost is likely to be that certain areas of the scene are never examined with the highest possible level of visual acuity. The result is that marginal defects may not always be observed by an inspector. Indeed every radiography inspector knows of cases where a defect appeared obvious only when it was pointed out by someone else! The results is that experimental Probability of Detection (POD) curves generally fall well below those based on the calculated values. Computer based image analysis systems may well have other defects, but it is straightforward to program them so that they perform exhaustive raster scans over the possible field of view giving a consistent performance.

More straightforward to appreciate is the degradation of human performance as the result of "human factors" such as fatigue, boredom or uncomfortably hot, cold or noisy working environments. The effects of these can now be readily demonstrated by standard trials of NDT test equipment(6). The effects are not small and a degradation of defect detection of a factor two is readily measured. A second factor is the variability of individuals when applied to the same task. Even experienced operators with the same certification can have appreciable differences in test performance. In the case of computer based systems their worst performance is generally also their best performance!

An important factor is the expectation of performance in terms of number of defects found. If an operator begins working for a company where there is a general expectation that defects are present then some will generally be found. However if the expectation is that defects are most improbable, and indeed if found their existence would be a financial disaster for both him and his employer, then it takes a brave inspector to report a defect of marginal visibility. This factor, the a priori probability of detection or classification of a certain type of defect is also an important part of Bayes' theory in classical statistics(7). The most accurate estimate on the likelihood of any statistical inference should always take into account any true knowledge of a priori probabilities. This can be performed formally by computer based methods through Bayes' theorem, although it can equally well be used instinctively by a human inspector. If we have accurate knowledge of the intrinsic likelihood of the presence of particular types of defect then we can use that knowledge to make more accurate judgements.

6. Copying the Human's Flexible Approach to Data Collection

One of the most valuable attributes of human inspectors using hand-held equipment is the flexible approach they instinctively take to collecting the data. They naturally operate in a multi-resolution mode, and often begin at quite low resolution sweeping over the specimen This builds up an experience base for the particular specimen of such factors as: How smooth is the surface? How regular are features such as the back-wall echo seen. These together define the reliability of the ultrasonic coupling, which in turn defines the speed and confidence level at which the inspection can proceed. At the same time the general level of defects present in the specimen becomes clear. Hopefully it is low enough that a real search must be made at a somewhat finer level of resolution. Human variability and temperament then comes to play. The "C scan" system where a camera above the workpiece follows the movements of the probe and displays the ultrasonic response on a projection of the specimen allow the movements contributing to an inspection to be identified. Some inspectors ape an automatic systems by performing a careful raster scan, some will move the probe apparently randomly. Most follow a median course in which the basic raster scan is continually being interrupted and modified. Any suggestion of a defect indication causes the inspector to retrace his steps, and examine the area in more detail at a finer resolution. The inspector may also change the probe, perhaps trading coverage for sensitive. This flexible approach works in that the whole area is covered relatively rapidly, yet the probability of detection of defects is greatly increased by the repeated scans over any region showing any sign at all of a significant signal.

Of course human inspectors also naturally operate well over complex geometries like bends and nozzles. It is easy to see that computer driven robotic arms following stored plans of the structure will soon be able to follow a similar path, but will they also be able to follow the retracing steps approach? Continuous monitoring of any back wall echo is one simple test that has already been incorporated into inspection systems in order to check that good coupling exists, although any retracing of steps still depends on the operator! However the multi-resolution approach remains complex to justify. Inspection managers choose the spacing of raster scan in order to have an appropriate probability of detection. Any broader raster fails the specification, any narrower one unnecessarily lengthens the inspection. However, as every safety manager knows, near misses should always be followed up! Repeating scans at higher resolution whenever any marginal indication of a defect is seen is only following this approach. Figure 8 illustrates this statistically for a defect whose response should in principal be above the defined threshold level shown. For a sparse scan, as shown by the closed circles, the indication from any one detection point may be below this threshold, either because of the unfortunate spacing of the points, or from some adverse statistical fluctuation. However if the point exceeds a lower threshold, this initiates retracing the scan with closer points, as shown by the open circles, which in this case enable the presence of the defect to be established even at the higher threshold level.

Figure 8. A coarse scan, as shown by the closed circles, may well fail to initiate a threshold because of a statistical fluctuation (points lying below the error bar), or because of unfortunate spacing of points (with none near the peak). A finer scan initiated by the lower threshold, might in this case yield points which exceed the original threshold.
8. Choosing the Best Path into the Future

The problem with inspection is that it has two aspects, sensitivity and consistency. Traditionally we have emphasised sensitivity, spending money on better equipment and more highly trained inspectors who would find ever more smaller defects. With the advent of Quality Assurance, the aspect of consistency has to be addressed. The overriding objective is the safety of the plant to some defined probability. For this to be assured then defects of a certain defined range of sizes need to be located to a given probability distribution. For each size, the dangers to the plant of the presence of a defect of that size need to be weighed against the probability of being able to detect such a defect. It is no longer important that all detectable defects are found. The need is to define the required probability of detection of defects as a function of size. It is here that humans have their Achilles heel, and give cause for worry. If human judgement is to be formally part of the detection process then its performance needs to be measured and included in the total probability of failure assessment. The variability seen in human performance must be included, and following usual practice, it must be included conservatively. This is the opportunity for computer-based inspection. There is an instinctive feeling that, to be useful, it has to compete with the best human judgement. In fact providing that its performance can be precisely defined, it may have to compete only at the much lower level of the most pessimistic human performance.

In the future the regulatory authorities may find themselves playing a quite different role. Humans may have to be regarded as too fallible for many situations. Most of us would feel this is already true for adding up our grocery bills. But if computer-based systems are brought in, the extent of their performance, as well as their accuracy and reliability must be assessed. Humans have this nice feature of applying common sense if the situation goes beyond their training. Expert systems are only as good as the rules that they contain, and these rules may have to be examined to new levels of detail to ensure that they cover all conceivable situations, or at least call in human help if a situation inconsistent with the existing rules is encountered. Neural networks perform well when "interpolating" within training points as on the regression fits of figure 4, but are in fact hopeless at extrapolating, However they can act as "novelty detectors" by evaluating the "distance" in their input variable space between the current case and the nearest example within the training data, and again calling for human help if any novel situation is detected. So it is easy to see the human judgement remaining, but it may fulfil a very different role from that of present inspectors. They may be able to sit back while their computers perform their exhaustive analysis of the data collected during inspections, but they must always be "on call" to deal with the unexpected situations or borderline cases where their judgement and experience will always be needed.

1. Windsor C G "The Classification of Defects from Ultrasonic Measurements", Neural Networks from Models to Applications Ed. Personnaz and Dreyfus G, I.D.S.E.T. Paris (1989).

2. Rumelhart D E, Hinton G and Williams G, "Learning Internal Representations by Error Propagation", "Parallel Distributed Processing", MIT Press, Cambridge, Massachusetts, 1986.

3. Windsor C G, Anselme F, Capineri L and Mason J P, "The Classification of Defects from Ultrasonic Images: A Neural Network Approach", Brit. J. NDT, 35, 15-22, 1993

4. S.F. Burch S F and N.K. Bealing N K,"A physical approach to the automated ultrasonic characterization of buried weld defects in ferritic steel",NDT International,19,145--153,1986

5. Ed. Croall I F and Mason J P, "Industrial Applications of Neural Networks", Springer Verlag. 1991

6. Murgatroyd R A et al: Human reliability in Inspection: Results of Action 7 of PISC III Programme, 1994.

7. Berger J, "Statistical Decision Theory and Bayesian Analysis", Springer, Berlin, 1985.

Dr Colin Windsor is a Senior Scientist in the National Nondestructive Trresting Centre at AEA Technology, Harwell. He worked in materials Science for many years before becoming interested in neural network applications some years ago.