SLAS

Error handling

From LabAutopedia

Jump to: navigation, search

Many laboratory automation processes operate unattended throughout the workday and night.  As automated systems are designed for more complex and lengthy operations, care must be taken to assure that these systems operate properly and generate valid data.  The best assurance is a well conceived and executed design, which minimizes the possibility of errors.[1]  Error handling in automated systems is a design issue, not a debugging issue.  Most recurring system errors are the result of a flaw in the planning and design process that becomes apparent during the debugging process.  Pressures to complete a process often prevent a return to the design stage to fix the root problem, so some form of error checking/handling "band-aid" is applied to deal with the situation.  However, even the best design may occasionally fail, so approporiate provisions for error and exception handling must also be put into place. 

Contents

Designing to avoid errors

Robust behavior is the most important automated system trait.  A sound design is the best assurance of robust system behavior.  Many times an automated system is rendered unreliable because of poor decisions made well before equipment is purchased.  A careful and detailed project planning process is the defense against this outcome. 

Procedure evaluation and modificaton

Before any system design has been determined, the candidate procedure for automation must be screened for LUO executions that are potentially problematic for automated devices, with the option to:

  • Leave these problem LUO's as manual operations within a largely automated procedure
  • Modify the problem LUO's to eliminate or minimize the potential for errors in the automated systems (lean sigma process optimization approaches can be useful)
  • Determine what error prevention, checking and recovery mechanisms can be built into the system to guard against errors that may be caused by the problem LUO's.

The following is a real-life example of such a process: [1]

  • A proposed automated procedure requires the addition of 1μL of a very hazardous, corrosive and volatile aqueous reagent to 20mL of sample in a tube.  No precision or accuracy specification existed for the 1μL addition.  It had always been done using a hand-held pipette, in a fume hood.  The reagent had alway been prepared from solid just before use in the same fume hood.   The following questions should be evaluated 
    • Is this process important to be included within the fully automated process, or could it be performed manually?
    • What is the stability of the aqueous reagent?  Preparing immediately before use is not convenient for an unattended automated system. 
    • What, in fact, are the precision/accuracy requirements of reagent addition?  Could a higher volume of more dilute reagent be used? 
    • Must this operation be conducted in a fume hood? 
    • Are there any methods for sensing a potential spill of this reagent? 

In this case, the step had to be included in the automated procedure for the project to be viable.  The aqueous reagent was found to be very stable at room temperature for at least 24 hours.  It was determined that a minimum molar amount of this reagent needed to be added to the sample.  Addition of excess reagent was not problematic.  Diluting the reagent was not problematic, as long as the minimum molar amount was added.  Any reagent open to the air had to be contained within a fume hood.  The reagent could be prepared in a fume hood, placed in a capped plastic vessel and transported to another (e.g. automated system) hood.  There were continuous analyzers available from the environmental monitoring industry that could sample ambient air for indications of a reagent spill.  The resulting automated procedure was to prepare a reagent supply once daily, and add to each sample 10μL of dilute (10x) reagent, allowing using simple, reliable liquid dispensing technology.  A custom fume hood was built contain all the apparatus for this portion of the automated procedure.  Hood air flow sensors and the reagent spill sensor were linked to audible alarms, operator pager, dedicated alarm line to building security, and to the automated system input controller to start a programmed system shutdown.

Follow this link for more real-life examples

Design to minimize human error

Any automated system may become unreliable because of human error.  There are numerous design approaches that can minimize the potential for human error. 

  • Automated startup status checking:  Create system initilization procedures that automatically survey the system for any setup oversight.  For instance, the system can determine if all reagent containers are correctly supplied via direct gravimetric measurement of the reagent vessel(s).  The presence of labware and consumables may be checked by sensors or by a robotic end effector.
  • System directed operator checklists: Create system-generated startup checklists to prompt system operators to perform key pre-operation maintenance or to observe start-up functionality tests.
  • Automatic Identification: The use of Automatic Identification and Data Capture (AIDC) technology (e.g. barcoding) can minimize errors in data entry, sample placement or tracking.   
  • Determine training needs:  What training may be necessary for system operators?  What is the cost and availability?  Are the current staff the correct staff to undergo this training?  These may not seem like a design issues, but they are.  The eventual system operators should be identified early and training needs assessed.  The cost of training should be built into the project budget.  Some specialized training may require scheduling far in advance.  Training cannot be a last-minute matter. 
  • System isolation: Automated systems generally do not benefit from casual or unintentional human interaction while in operation.  Systems can be protected via enclosures or light curtains (a device which uses an array of photoelectric sensors to detect the presence of an object).  Systems performing operations that are prone to outside contamination (e.g. cell culture experimentation) may require a filtered positive air flow enclosure.  Systems performing operations that could be hazardous to laboratory personnel may require isolation such as a negative air flow enclosure (e.g. a fume hood) to contain hazardous fumes or biologics.  If the movement of the physical components of a system could harm a person, the system must be isolated via an enclosure or light curtain interlocked to an emergency shutdown circuit.  See ANSI Robotic Safety Standards.  The cost of isolation systems and approaches must be included in the estimated system cost. 

Design for operational robustness

  • Use automation grade consumables.  These consumables are made to a higher tolerance than ordinary, human-grade consumables, offering a higher level of geometric accuracy and functional reliability.  Some consumables may be designed to be geometrically matched with the mechanical fixturing of specific automated devices or workstations.  For instance, the automatic attachment of a pipette tip to a fixture is much more reliable if the inner taper of the pipette tip has been designed to be complimentary with the taper and design of the pipette attachment fixture.  Such consumables are generally more costly, but the cost is a tradeoff for system reliability.  This cost should be included in the estimates of system expense and return on investment.
  • Identify potential maintenance issues: Potential points of maintenance-related system errors can often be identified at the time of system design.  If they cannot be eliminated via procedure or design changes, at least they can be prepared for by developing proper maintenance protocols.  For instance, if a reagent being dispensed by the system tends to crystallize in air such that a dispensing nozzle can become blocked over time, the cleaning of this nozzle should be a regular maintenance item.  If a corrosive compound is being used in a system, regular inspection for metal corrosion will be necessary and some form of preventive cleaning and surface treatment is desirable.  In general, automated systems performing laboratory tasks (which can be messy) greatly benefit from maintenance protocols that focus on regular cleaning. 

Error detection

Detailed article: Sensors, Liquid Handling Error Detection 

Despite the best design efforts, system errors will still occur.  The first step toward dealing with "system exceptions" is detection of such.  Many devices will have some built-in event sensing, such as detection of motor failure or loss of power.  Other devices may be designed to monitor and report key their operational parameters, such as centrifuge spin speed or the temperature of a thermal control device.  Still other devices may have feedback mechanisms that can be programmed for error sensing use, such as force feedback from a robotic end effector to determine the presence or lack of presence of a physical object.  The behavior of such feedback mechanisms must be studied carefully relative to the variability of labware to determine the robustness and reliability of the feedback.

Additional sensors are often necessary to evaluate specific conditions or events.  Many laboratory automation systems involve the movement of liquids, solids or objects and it is desirable to verify that such movement did take place.  The simplest and least costly external means of vertifying movement is a mechanical switch.  However, switches contain moving parts and contact surfaces that may become worn or fouled after repeated use.  Switches are generally small and configured to be normally open or closed. Switches can be designed to respond to a variety of mechanical stimuli, such as vibration (trembler switch), tilt, air pressure, fluid level (float switch), the turning of a key (key switch), linear or rotary movement (limit switch or microswitch), or presence of a magnetic field (reed switch). Switches are generally configured to be normally open (n.o.) or normally closed (n.c.). In the laboratory, the reliability of liquid movement through valves can be enhanced by using switches to verify proper valve positioning. Robotic devices may use switches to confirm home locations. Switches can be used to verify the proper positioning of auxilary devices such as doors.  Pneumatic cylinders, often used for simple two-position movement, can be purchased with imbedded mechanical sensors to confirm end-of-travel location. The major disadvantage of mechanical switches is possible degredation of their mechanical movement or electrical contact with repeated use or surface oxidation or coating (always a concern in a biochemical environment).

Proximity sensors are used to detect the near presence of some object without actual physical or optical contact.  Compared to mechanical switches, they have a long functional life because of the absence of mechanical parts and lack of physical contact between sensor and the sensed object.  They generally have a relatively short range and the output state can be highly dependent on the exact detection configuration and nature of materials involved.  Slight changes in such may require recalibration.  Inductive proximity sensors are often used to detect small parts such as chromatography vial caps or the movement of metal objects such as racks or drawers.  Capacitive proximity sensors are based on the dielectric constant of the material being detected and can be used to detect the presence of a wider range of materials.  In some cases the change in capacitive signal can be used to quantify the presence of a material, such as liquid within a pipette tip.  Calibration of the output states is highly dependent on configuration and materials and is very sensitive to changes.  Liquid level sensing in a pipette tip is a prime example.  There is a marked difference in the capacitive change caused by deionized water vs. buffer-containing water vs. DMSO.  Different pipette tip materials will also give different results.

A unique proximity sensor is based on ultrasonics.  These sensors use the sonar principal in air, sending out an ultrasonic chirp, then switching to the receive mode to detect a return echo from the surface of the target. With the speed of sound in air (or other gas) as a given, distance to the target can be calculated. Variables such as temperature and humidity may be sensed and compensated for, resulting in highly accurate readings. This can be used to determine the level of fill in a laboratory vessel, tube or microplate.

Optical sensors are contactless sensors and offer the advantage of a substantially larger detection range compared to inductive or capacitive sensors.  They are also more complex, expensive and prone to their own unique failure modes.  Optical sensors utilize various spectral regions. Infrared sensors avoid interference from ambient light.  Red light sensors offer a visible beam that can be sensitive to the color of the target.  Laser sensors can be used for highly accurate distance measurements.  Through-beam models have a separate beam transmitter and detector and retroreflective models contain the transmitter and detector in the same housing, with the beam bounced off a reflector.  Interruption of an optical beam can be used to confirm the presence of common laboratory consumables, such as pipette tips or filters, without disrupting the normal flow of system operations.

Passive optical sensors, such as CCD or CMOS imaging sensor, detect natural energy (radiation) that is emitted or reflected by the object or scene being observed.  They are a highly flexible and nearly universal sensor, but are also complex and data intensive.  Whereas the sensors described above generally offer a simple digital yes/no signal, imaging devices yield a very rich, detailed set of information about the scene being observed.  This can be an advantage when a complex situation is being evaluated or a disadvantage when the situation being evaluated is buried within an excessive amount of information.  Imaging sensors require image capture hardware and software as well as image interpretation software.  They also require careful control of lighting conditions to assure consistent images over the course of time.  The LabAutopedia article on imaging offers an indepth discussion of the topic.

Force transducer sensors, or load cells convert a force indirectly into an electrical signal.  Through a mechanical arrangement, the force being sensed deforms a strain gauge. The strain gauge converts the deformation (strain) to electrical signals, which in turn can be calibrated and used to calculate force, mass, or weight.  Force transducers can be placed on the end effector of robots to measure gripping force.  The common laboratory balance is a load cell. It is common to integrate small load cells or balances into laboratory automation systems for monitoring the volume of reagents in bulk vessels, or the quantity of material dispensed into a vessel, in some cases in real-time as the dispensing occurs. Load cells can be used to calculate the level or quantity of a well-behaved, known material inside a vessel of known geometry and/or known empty weight.  With knowledge of the specific gravity of the material and the geometry and empty weight of the vessel, the level of material in a vessel can be calculated from the weight of the full vessel.  For highly accurate calculations, other factors such as temperature and pressure must also be determined and included in the calculation.  This method works best for liquids or solids of low viscosity and high uniformity, and less well or not at all for materials of higher viscosity or non-uniformity.

Timed event sensing is a simple way to determine if a system deviated from normal operation.  The non-appearance of a planned, regularly expected event indicates that something has interrupted normal system operation and should be investigated.  This expected timed event could be part of normal system operations, such as the periodic transmission of data.  It could also be a periodic signal to reset an external countdown timer which is started in countdown mode when the system is initialized.  If the countdown timer reaches zero due to the lack of a reset signal, the assumption is that the system has failed in some way.  This mode of error sensing has the advantage of not being dependent on continued execution of the automated system program, such as in the case of a hard system crash.

Error response

Once an error has been detected, there are three action options:

  • Automatic error remediation:  The system is programmed to attempt to recover from the error state and continue normally if successful.
  • Wait in error state: The system stops in or is stopped by the error condition and waits until noticed.
  • Enunciate and wait in error state: The system stops or is stopped by the error condition and enunciates the error condition, i.e. sounds an alarm of some type. 

Automatic error remediation

Usually the most desirable system response to an error is for the system to correct the error and continue operation.  This can be challenging because automatic recovery requires knowledge about the physical environment of the system at the time of error.  Assumptions about anticipated error conditions can be made and error recover software routines can be written based on those assumptions.  However, similar errors may arise from a variety of causes.  Devising sufficient error detection to decipher error causes and programming error recovery routines for each scenario is time consuming, tedious and often impractical or impossible.  Such routines must also take into account the possibility that the detection of an error condition is itself an error, a false report.  Error recovery routines should be devised so as to minimize the possibility of making the problem worse, turning a recoverable situation into an irrecoverable one.  An example is recovery from the detected failure of a robotic device to attach a pipette tip (single or multiple) to a pipetting fixture.  An automated recovery routine should begin by executing the software subroutine regularly used to remove and discard a pipette tip after use, based on the assumption that the pipette tip(s) may actually be attached but the detection mechanism failed.  The robot may then attempt to attach the next available pipette tip.  Failure to consider the possibility of a false error could result in jamming the pipette tip fixture with tip(s) attached into the next available tip(s).  Such a condition is likely irrecoverable, perhaps requiring powering down the system, and potentially causing physical damage to the robotic device or fixturing. 

More sophisticated automated error recovery may involve bypassing the error condition if it does not impede the further processing of samples.  If for instance, a microplate-handling robot sensed that it failed to remove a plate from a rack, and subsequent recovery attempts failed, the system could be programmed simply to move on to the next microplate and continue processing.  This assumes that the failure did not result in dropping the microplate in a location where it will impede other operations.  The only sensor that could attempt to shed light on such a complex "what if" would be an imaging sensor.  Elaborate recovery or bypass recovery routines must also take into account maintaining any critical timing of samples in progress.  This may go beyond the sophistication of basic system programming and will necessitate use of system scheduling software.  The most sophisticated error recovery algorithms would make use of the principles of artificial intelligence to interpret situational information and formulate solutions.  For practical purposes, such sophistication has not made it's way into routine laboratory automation use. 

There may be cases where automated recovery is not desirable.  If the system is handling materials potentially dangerous to humans or to system hardware, attempted automated recovery may increase the possibility of exposure.  If the system is handling materials that are very precious, automated recovery could result in wasting that material.   

Wait in error state

If the process the system is performing is not considered critical, it may be acceptable to allow an error state to persist until someone happens to notice the situation.  There would be no automated protocol to bring attention to the error state.  If automated recovery has failed the system should be programmed to create a safe state and wait.  A "safe state" will differ for each application, but is basically defined as a state which poses minimal risk to humans, equipment and samples, and which may offer recovery options once human intervention occurs.  This could involve placing moving components in neutral positions, purging reagent lines, shutting down centrifuges, turning off heaters, etc. 

If the system error prevents transition to a safe state (i.e. a robotic crash), then obviously creation of a safe state is limited, if not impossible.   

Enunciate and wait in error state

If the process the system is performing is such that prompt error recovery is desirable, then the means should be created for an enunciation of the error state.  There are various levels of enunciation that can be employed. 

  • Error message on system monitor or display
  • Local audible alarm
  • Local visual alarm (i.e. flashing light)
  • Remote audible alarm
  • Pager or cell phone notification
  • Email or network message

If the automated system has "crashed", and program execution is blocked, then the desired mode of enunciation must be triggered independently of the system program, such as by timed event sensing.  

Audible alarms can be very effective, but also very irritating once a response has been made but the error not resolved.  Alarms in general do not create a calm atmosphere for human-directed error remediation, so it is highly recommended that a method of silencing alarms should be provided.  However, alarms should not be completely disabled or turned off until the error state no longer exists, either via resumption of system operations or system shutdown.  The responsibility of personnel to respond to error enunciation should be clearly understood, especially if the system is to operate unattended after work hours and remote notification (pager or cell phone) is utilizied.    

Links

References

  1. 1.0 1.1 Hamilton, S.D. Avoiding and Handling Errors During Unattended Operation of Automated Laboratory Equipment, Laboratory Robotics and Automation, VCH Publishers Inc., 1989, 1, 53-61
Click [+] for other articles on 
Error handling(3 P)
The Market Place for Lab Automation & Screening  The Market Place
Click [+] for other articles on  The Market Place for Lab Automation & Screening  Robots-Automated Systems Integrated Systems & Services