Considerations for sensor failure modes
The other day I received a report of a prolonged downtime on night shift for what turned out to be a pneumatic gripper prox malfunction. The normal solution for this type of problem is simple and quick and only requires replacing the sensor. This quick solution has one major assumption; that it is known where this problematic sensor is located.
In a good PLC program sensor errors will be reported to the operator on the HMI in a concise and interpretable way to easily locate the source of the problem. This specific machine did give this information on a sensor failure, however, the PLC program did not account for all possible sensor failure modes. Due to this specific failure mode of the sensor the machine stopped mid cycle without displaying any errors or reasons for being stuck in a limbo state.
Basic Failure Modes
This will be a specific example based on prox sensors used to know the closed / opened state of a pneumatic gripper. The gripper will have two prox sensors; one for the open position and one for the closed position. There are two basic failure modes.
- Sensor always on
- Sensor always off
When using a gripper which is defined by two sensors there is a derived failure mode based on the first failure above.
- Both sensors on
Specifically for a pneumatic gripper these failure modes can be categorized into three groups.
- Physical / electrical failure of prox sensor
- Mechanical failure of gripper or of surroundings
- Setup error with regards to prox location
If the PLC program follows good OO methodology there would be a FB used for all gripper instances. Based on the aforementioned failure modes and basic usage, this FB should implement the following functionality.
- Delayed state output after prox turns on
- Error state if both prox sensors are on
- Error state if a prox is expected to turn on but it doesn't after a timeout
These two error states will catch and report the two failure modes of the sensor regardless of the root cause of the failure.
The code below represents the main areas of FB_Pneumatic. Portions of the code have been omitted for a more succinct representation of the problem at hand.
The first two lines of the body are the delayed timers once the prox have turned on. This is ensure the next sequence step is not prematurely getting triggered since prox sensors have a fairly large activation area. A prox normally turns on before the pneumatic device has actually reached its end of stroke position and has stopped moving and it is common practice to have a 50-100ms delay to ensure correct final positions.
The next two conditional blocks create warnings on the HMI if the sensor is considered to be slower than usual at being activated. This FB records the time of every activation of the sensor and saves it in a database to be used for preemptive monitoring of the device. If for whatever reason the sensor activation time starts to slow down a warning will be given to the operator and it can be investigated before it turns into a problem.
It then checks whether both prox sensors are turned on simultaneously and emits an error. Finally the last two blocks check for a sensor timeout after a 5 second delay.
// Release and open gripper METHOD doWait : BOOL VAR_INPUT END_VAR IF ready AND waitItl THEN goWait := TRUE; goWork := FALSE; END_IF doWait := isWait; // Activate and close gripper METHOD doWork : BOOL VAR_INPUT END_VAR IF ready AND workItl THEN goWork := TRUE; goWait := FALSE; END_IF doWork := isWork; // Body inWorkTm(IN := work, PT := T#50MS); inWaitTm(IN := wait, PT := T#50MS); IF work AND goWorkTm.IN THEN tt := TO_INT(goWorkTm.ET); IF TO_LREAL(goWorkTm.ET) > avg * 1.3 AND avg > 0 THEN msg.CreateEx(TC_EVENTS.PneumaticEventClass.WorkSlow, 0); msg.ipArguments.Clear().AddInt(TO_INT(goWorkTm.ET)).AddLReal(TO_INT(avg)); msg.Send(0); END_IF END_IF IF wait AND goWaitTm.IN THEN tt := TO_INT(goWaitTm.ET); IF TO_LREAL(gowaitTm.ET) > avg * 1.3 AND avg > 0 THEN msg.CreateEx(TC_EVENTS.PneumaticEventClass.WaitSlow, 0); msg.ipArguments.Clear().AddInt(TO_INT(goWaitTm.ET)).AddLReal(TO_INT(avg)); msg.Send(0); END_IF END_IF IF wait AND work THEN alarm(TC_EVENTS.PneumaticEventClass.DoubleSensor); END_IF goWorkTm(IN := goWork AND NOT work, PT := T#5S); IF goWorkTm.Q THEN alarm(TC_EVENTS.PneumaticEventClass.WorkTimeout); END_IF gowaitTm(IN := goWait AND NOT wait, PT := T#5S); IF gowaitTm.Q THEN alarm(TC_EVENTS.PneumaticEventClass.WaitTimeout); END_IF
This code will cover the aforementioned failure modes and emit the necessary errors. As you must have already concluded, there is more to this story.
Uncommon Failure Mode
Going back to the machine downtime example the sensor failure mode was part of a third, much more uncommon, failure mode.
- Erratic / flickering sensor
The code is structured by using two timers, one for the delayed activation and one for the timeout error. The input to the delayed activation is the physical prox input itself and the timeout error input is the physical actuation output and the physical prox input.
A quick glance at the code and you could think it as ok. Once the prox has been on for 50ms set the in position output to on. Once the actuation is enabled and no prox input has been seen in 5s go into error. What happens once the prox starts to flicker?
First an obvious assumption:
prox delay time << prox error timeout time. In this example
50MS << 5S. Now the flickering error falls into two categories based on the frequency of the flicker.
flicker on time >= prox delay timeor specifically
flicker on time >= 50ms
flicker on time < prox delay timeor specifically
flicker on time < 50ms
Below is the timing diagram of both cases based on the above code. With flicker on time less and more than the output delay time. It shows the two timers and their inputs.
gowork is the output to actuate the gripper and
work is the prox input.
inWorkTm(IN := work, PT := T#50MS); goWorkTm(IN := goWork AND NOT work, PT := T#5S);
Longer Flicker Time
The first case will likely work ok. This depends on the rest of the PLC program. If there's a single conditional check for the sensor to be on and then the sequence continues it will work as intended. This is because the flickering sensor stays on for long enough for the delay output timer to complete and for the program sequence to continue.
For example this program checks the prox output only once then continues and ignores it.
CASE state OF 0: IF gripper.doWork() THEN state := 10; END_IF 10: // Continue sequence END_CASE
Whereas this program checks the prox output once initially, but then continuously monitors it during the next sequence of operations.
CASE state OF 0: IF gripper.doWork() THEN state := 10; END_IF 10: IF gripper.isWork THEN // do operation IF op_done THEN state := 20; END_IF ELSE // Gripper no longer in closed / actuated state // Error END_IF END_CASE
The second example above will catch a flickering sensor in the program logic. However, in my opinion, this is not the correct way of doing it. Watching for this error in the parent program is tedious, error prone, and adds complexity to program. This breaks principles with scope, encapsulation and DRY (Do Not Repeat). The error discovery is not done at its root, but caught further up the stack. Program architecture and design is another topic in itself for another post.
Shorter Flicker Time
What happened is the flicker on time was less than the delayed output time;
flicker on < 50MS. This created an endless loop of starting both timers and resetting them before they were able to complete. Thus putting the machine into a limbo waiting state.
There is a trivial solution to make the code react to one case of the flickering issue. The two timers are independent from each other and only rely on physical signals. When you consider how the FB works its output is not the direct physical signals, byt a delayed output based on the physical signal. Therefore the error detection of the FB should also be based on this delayed output. This would now make the error timer dependent on the delayed output timer.
This would solve the case when
flicker on time < prox delay time.
inWorkTm(IN := work, PT := T#50MS); goWorkTm(IN := goWork AND NOT inWorkTm.Q, PT := T#5S);
The second case where
inWorkTm.Q does turn on or
flicker on time >= prox delay time requires more work. This also becomes a program design question. Do we want to catch such a flickering prox? Does it matter if the prox flickers slightly when the pneumatic device is in the correct location?
To solve this, we need another type of timer that is not found in the standard library. The basic
TON timer starts its timer when the
IN goes high, and resets its timer to 0 when the
IN goes low. This is what gives us the current behavior. A pauseable timer gives the functionality needed - with one potential caveat. Start the timer when the actuation output goes high and pause it once the delayed output goes high.
inWorkTm(IN := work, PT := T#50MS); goWorkTm(IN := goWork, PAUSE := NOT inWorkTm.Q, PT := T#5S);
PAUSE input goes high the timer increases and will eventually reach the timeout time and generate the error. This is were the caveat comes into play. If the machine sequence is fast enough in that specific state or the flicker off time is much smaller than the flicker on time (small duty cycle) this will not detect the flicker error as the error timeout never gets reached. This can be tuned by changing the error time of 5s to a value that better matches the type of machine you are working on and its cycle speed.
For completeness here's an example of a
FUNCTION_BLOCK TON_Pauseable VAR_INPUT IN : BOOL; PT : TIME; PAUSE : BOOL; END_VAR VAR_OUTPUT Q : BOOL; ET : TIME; END_VAR VAR trig : R_TRIG; pausedTime : TIME; tm : TON; END_VAR IF NOT IN THEN pausedTime := T#0s; END_IF trig(CLK := IN AND PAUSE); IF trig.Q THEN pausedTime := pausedTime + tm.ET; END_IF IF IN AND pausedTime >= PT THEN Q := TRUE; ET := PT; ELSE tm(IN := IN AND NOT PAUSE, PT := PT - pausedTime); Q := tm.Q; ET := pausedTime + ton.ET; END_IF
If any flicker in the sensor needed to be detected after its delayed output is turned on, this can also be trivially achieved with a conditional check done before the timer call.
IF inWorkTm.Q and not work THEN // Error END_IF inWorkTm(IN := work, PT := T#50MS); goWorkTm(IN := goWork AND NOT inWorkTm.Q, PT := T#5S);
This is a very tight control as if the sensor is off for 1 PLC cycle then the device will error out. Is this tight of a error checking needed? Depends on how critical that specific sensor is in your application. I would be more inclined to turn this into a warning for a potential sensor failure instead of a machine stopping error. The warning message would need to be debounced otherwise it can be sent hundreds of times.
IF inWorkTm.Q and not work THEN IF workFlickerWarn THEN // Send warning workFlickerWarn := FALSE; END_IF END_IF inWorkTm(IN := work, PT := T#50MS); goWorkTm(IN := goWork, PAUSE := inWorkTm.Q, PT := T#5S); IF NOT goWork THEN workFlickerWarn := FALSE; END_IF
In this last example you can also see I use the pauseable timer. This is probably the best of both worlds.
This shows there can be many subtleties even when it comes to such a basic automation item as a prox sensor or gripper. A simple program can be easily created in a relatively short time, however does it really cover all potential failures? Perhaps a better question, and this is a basic engineering decision of tradeoff. Is it even worth the time, effort, CPU computation power, etc to look for this type of error?
To give you the bigger picture, this machine has been running for over 5 years with close to a hundred pneumatic actuators all using this same code. There have been many times where prox sensors failed in the two basic failure modes for whatever reason and the machine correctly alarmed. There has been just 1 occurrence of this flicker failure up to now and none since.
In my mind, the pauseable timer is the best solution and will be implemented to prevent this type of prolonged downtime, regardless of how unlikely it is. What will you do with your sensors?