Automatic identification and data capture

From LabAutopedia

Jump to: navigation, search

Automatic Identification and Data Capture (AIDC) refers to the methods of automatically identifying objects, collecting data about them, and entering that data directly into computer systems (i.e. without human involvement). Technologies typically considered as part of AIDC include bar codes, Radio Frequency Identification (RFID), biometrics, magnetic stripes, Optical Character Recognition (OCR), smart cards, and voice recognition. AIDC is also commonly referred to as “Automatic Identification,” “Auto-ID,” and "Automatic Data Capture."  


Laboratory use

In the laboratory, the purpose of AIDC is to:

  • Minimize data entry errors by minimizing human interpretation and entry of data. It is generally accepted that manual transcription is one of the major sources of errors in the laboratory, with one study estimating that one in every 300 keystrokes was in error. 
  • Enhance productivity by improving the timeliness or speed the entry of data and removing the need for manual intervention.
  • Improve item or process tracking by making identification-related data entry possible in situations where manual data entry would be inconvenient, unreliable or impossible. 

Automatic ID has many uses in the laboratory:

Identification of sample-containing vessels: 

The obvious use of automatic ID in a laboratory environment is to track samples.  The laboratory environment is different from common automatic ID environments, such as retail or transport.  Sample vessels tend to be small, ranging from tubes to vials to microplates.  Thus the space available to create or attach an ID element is restricted.  Sample vessels are commonly made of glass or one of several polymers (i.e. polystyrene, polypropylene)  These vessels and their associated ID elements may be subjected to a range of conditions - heat, cold, humidity, vibration, solvents, acids and bases.  The ID element must therefore be designed to tolerate conditions anticipated for a given sample.  Samples often are transferred to secondary vessel(s) in the course of a laboratory operation, so capability must be present to create or attach a secondary ID element(s) with the secondary vessel(s).   

Identification of reagent-containing vessels:

Automatic ID may also be used to identify vessels containing reagents or bulk chemical supply.  The constraints for this use are not as severe as with sample-containing vessels, since reagent vessels often tend to be larger, not subject to such variety of conditions and not transferred to secondary vessel that must also be tracked.  The most common use of automatic ID labeled vessels would be to assure that the proper reagents are placed and connected appropriately with an automated instrument, workstation or system. 

Identification of equipment:

Most laboratories tag instrumentation for inventory control and accounting purposes.  Instrument tags are often read and logged when maintenance is performed on the device or system.  In some cases, the instrumentation tag is read to establish which instruments or devices were used in the course of processing a sample.  Commonly this tag will be printed on a metallic medium with a strong adhesive (to discourage removal) and will contain a bar code and a human-readable ID line.   

Identification of laboratory personnel:

Entry to buildings or specific spaces may be controlled.  Access to data systems or operation of specific instrumentation may require operator identification.  Adherence to SOP's may require that personnel log their identity before performing the SOP process.  The checkout of laboratory supplies from a stockroom or chemical compounds from a library may require personnel identification.  Identification of personnel can take advantage of a wide range of automatic ID technology.  The simplest approach is to require personnel to carry an identification card that may include a scannable bar code, magentic stripe or smart card technology.  However, this approach does not guarantee that the card holder is, in fact, the authorized individual.  To attain that level of security, a biometrics approach is necessary.  Fingerprint scanning is highly reliable and reasonably cost effective.  Retinal scanning is also highly reliable, but more costly.  Voice regcognition is less reliable, as is facial image recognition. 

Entry of instructions or data:

Automatic ID can be used for simple data entry in lieu of keyboard entry.  For instance, an operator may choose from a list of pre-programmed instrumental operations by scanning the appropriate bar code from a list posted at the instrument.  The checkout of supplies from a stockroom may be done via scanning a bar code on the item or from a pre-prepared list of supply items. 


There are two basic strategies that may be employed for the use of AIDC in the laboratory:

  • Information Container:  This strategy involves encoding key information about the item being identified into the automatic ID element itself.  For instance, a bar code might contain key information about a sample - e.g. lot number, date, time, investigator, compound number, etc.
    • The advantage of this approach is that essential information about the item being identified is physically attached to that item via the ID element.  The ID element has "real world" meaning that can be accessed at any time, without the need for any link to additional information.  
    • The disadvantage of this approach is that it generally requires the ID element to be created on-demand, since the key information to be included is often created or comes together only at the time of item creation.  This limits the options for ID element creation to something that can exist on-site, such as an on-demand bar code printer, which may have certain quality and material limitations.  This approach also leads to lengthy ID elements, due to the need to include so much key information.  This can be a liability if it leads to placing a large label on a small item, or shifting to a more information-dense but more costly automatic ID technology. 
  • Information Pointer: This strategy involves using the ID element as a pointer to further information about the item being identified.  Often referred to as the "license plate" approach, the ID element often points to a file or files of information in a database.
    • The advantage of this approach is that the ID elements can be random identifiers, able to be pre-created or purchased, allowing for higher quality or speciality ID elements.  For instance, lithographically produced bar code labels can be made to a much higher resolution and environmental resistance than can on-demand printed labels.  Unique labels for harsh environments, such as laser etched labels are better produced in advance. 
    • The disadvantage of this approach is that the ID element means nothing without an active means to access the information the ID points to.  Should this information link go down, the ID element is useless. 

Both approaches are commonly used in laboratories.  It is important to determine early in a project which strategy is appropriate, as the choice will affect not only the purchase of automatic ID technology, but may impact other aspects of laboratory automation and data management associated with the project.

AIDC technology

AIDC is commonly used today in many different laboratories, from R&D to analytical service to clinical diagnostic labs.  Each environment is different and so may employ or or more of the types of AIDC technology available today:

Bar Coding

Detailed article:Bar Codes

Bar coding is an automatic ID technology in which a series of optically-readable stipes, bars or squares create a binary representation of data in a one-dimensional, two-dimensional or matrix format, serving as an information container or pointer.  It is easily the most common form of AIDC used in laboratories today.[1]  Bar codes are most commonly printed on a label stock and attached to the item being identified.  The exact choice of label stock, form of print and label adhesive must be carefully matched to the environmental and operational conditions expected for the identified item.  Thermal transfer printing technology is most often used in laboratories, offering the best combination of resolution, durability and cost, when used with a synthetic label stock, such as polyester.  Great care must be taken in choosing label stock adhesive, since any "oozing" of the adhesive can lead to the sticky material coming in contact with automated system components, such as grippers, thus causing manipulative errors.

Bar codes may be printed directly on laboratory vessels, although this is usually done at the factory, not on-demand in the laboratory[2][3].  For high durability, bar codes can be laser etched (factory or on-demand) onto labware[4] (glass, plastic) or pin-stamped into metal[5][6].  "Print and apply" devices which will print and attach bar code labels to labware on-the-fly within an automated system are available from several technology providers.[7]  The placement of a bar code on small laboratory vessels can be a challenge.  Minimizing print size has a negative impact on first-read rates.  For this reason, the ID's should be kept as short as possible, which argues for the Information Pointer approach, or the use of higher-density formats, including Matrix formats.  Among 1D bar code symbology, Code 128 is a common high density code, containing 128 ASCII characters with a maximum density of about 18 characters/inch.  2D symbology, specifically DataMatrix, has a maximum theoretical density of 500 million characters per square inch.  The practical density will, of course, be limited by the resolution of the printing and reading technology used.

Bar code scanners used in the laboratory will either be hand-held (typically for bench or desk operations) or fixed-mount (typical within an automated system).  Scanners are usually one of three technology types:

  • Laser Scanners: Use a laser beam as the light source and typically employ either a reciprocating mirror or a rotating prism to scan the laser beam back and forth across the bar code. A photodiode is used to measure the intensity of the light reflected back from the bar code. The light emitted by the reader is tuned to a specific frequency and the photodiode is designed to detect only this modulated light of the same frequency.
  • CCD Readers: (also referred to as LED scanner) use a single row of hundreds of tiny light sensors in the head of the reader. Each sensor can be thought of as a single photodiode that measures the intensity of the light immediately in front of it. Each individual light sensor in the CCD reader is extremely small and because there are hundreds of sensors lined up in a row, a voltage pattern identical to the pattern in a bar code is generated in the reader by sequentially measuring the voltages across each sensor in the row. The important difference between a CCD reader and a laser scanner is that the CCD reader is measuring emitted ambient light from the bar code whereas pen or laser scanners are measuring reflected light of a specific frequency originating from the scanner itself.
  • Camera-Based Readers: Use a small video camera to capture an image of a bar code. The reader then uses digital image processing techniques to decode the bar code. Video cameras use the same CCD technology as in a CCD bar code reader except that instead of having a single row of sensors, a video camera has hundreds of rows of sensors arranged in a two dimensional array so that they can generate an image.

Radio Frequency Identification (RFID)

Detailed article:Radio Frequency Identification

RFID is a non-contact ID technology utilizing radio frequency information.  A basic RFID system consists of three components:

  • An antenna or coil
  • A transceiver (with decoder)
  • A transponder (RF tag) electronically programmed to emit unique ID information when energized

The antenna of the transceiver emits radio signals to energize and activate the tag, which then emits it's unique ID via radio signals that are recieved by the transceiver antenna and decoded. Antennas are the conduits between the tag and the transceiver, which controls the system's data acquisition and communication.  There are different kinds of RFID tags based on their attachment with identified objects, i.e. attachable, implantable, insertion and even digestible tags.  Low-frequency (30 KHz to 500 KHz) systems have short reading ranges (one foot or less) and lower system costs. They are most commonly used in security access, asset tracking, and animal identification applications. High-frequency (850 MHz to 950 MHz and 2.4 GHz to 2.5 GHz) systems, offering long read ranges (greater than 90 feet) and high reading speeds, are used for such applications as railroad car tracking and automated toll collection. 

Active RFID tags are powered by an internal battery and are typically read/write, i.e., tag data can be rewritten and/or modified.  An active tag's memory size varies according to application requirements; some systems operate with up to 1MB of memory. In a typical read/write RFID work-in-process system, a tag might give a machine a set of instructions, and the machine would then report its performance to the tag. This encoded data would then become part of the tagged part's history. The battery-supplied power of an active tag generally gives it a longer read range. The trade off is greater size, greater cost ($20 or more), and a limited operational life (up to 10 years).

Passive RFID tags operate without a separate external power source and obtain operating power generated from the reader. Passive tags are consequently much smaller and lighter than active tags (some are grain of rice size), less expensive, and offer a virtually unlimited operational lifetime. The trade off is that they have shorter read ranges than active tags and require a higher-powered reader. Read-only tags are typically passive and are programmed with a unique and limited set of data (usually 32 to 128 bits) that cannot be modified. Read-only tags most often operate as a information pointer into a database, in the same way as linear barcodes reference a database containing modifiable product-specific information.

The significant advantage of all types of RFID systems is the noncontact, non-line-of-sight nature of the technology. Tags can be read through a variety of substances and challenging conditions, where barcodes or other optically read technologies would be useless. RFID tags can also be read at response speeds of less than 100 milliseconds. The read/write capability of an active RFID system is also a significant advantage in interactive applications such as work-in-process or maintenance tracking. Though it is a costlier technology (compared with barcode), RFID has become indispensable for a wide range of automated data collection and identification applications that would not be possible otherwise.

RFID technology entered the realm of laboratory automation in the late 1990's.  Irori developed a combinatorial synthesis system which included their RFID-based AccuTag system for identifying and sorting microreactor containers (each containing a small RFID chip).  Around the same time, Ontogen developed a similar system, called OntoCODE, which was later licensed to Irori.  These applications soon disappeared from the marketplace.  Now the technology is again finding it's way into the laboratory, as costs decrease and technical specifications improve.  In 2005, the Drug Enforcement Agency (DEA) announced that they would be using RFID to tag and track evidence moving through their laboratories.  Clinical laboratories, such as the Mayo Clinic, have begun using RFID tags for tracking specimen containers.  Pharma companies, such as Sequenom, have begun using RFID in conjunction with compound library storage and identification. For compound collections, RFID offers the possibility to read multiple ID's at once - to essentially do an inventory with one radio pulse.  This technique is referred to as "singulation", the means by which an RFID reader identifies a tag with a specific serial number from a number of tags in its field. While this approach is possible with compound libraries, is limited by the type of RFID device commonly used in a collection (passive, small, inexpensive, short range) and interference caused by metal, water or anything that can generate or distort electromagnetic energy.    

Optical Character Recognition (OCR)

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of images of handwritten, typewritten or printed text into machine-editable text.  The most common usage of this technology is the OCR processing of scanned documents, usually using a flat-bed scanner.  OCR can also be used in industrial processes, such as the interpretaton of serial numbers and notes on industrial components, assembled products, packaging material and/or mail, with a goal of identifying, matching, sorting, tracking, or verifying these products as they move through various laboratory, manufacturing or material handling processes.    The challenges and constraints of using OCR in automated laboratory sample processing are very similar to that of industrial applications.   

The overall process consists of five processing steps:

  • Image acquisition:  Document OCR represents a relatively controlled imaging environment compared to industrial OCR.  In industrial applications, the object being imaged may be moving.  Therefore image capture is usually done using a high speed digital imaging device connected to a high speed frame grabber (An electronic device that captures individual, digital still frames from an analog video signal or a digital video stream, usually employed as a component of a computer vision system).  Document imaging usually utilizes a flat bed scanner, which is relatively self-optimizing and does not require a separate frame grabber device. The resolution of image acquired must be sufficient for subsequent image processing, but also sized such that image file size does not become a hinderance to maintaining OCR throughput.  Every attempt needs to be made to assure image quality, such as good lighting and contrast.  The spectrum of image acquisition may be adjusted to offer highest contrast, as well as limiting the acquistion and storage of unnecessary spectral data.  For that reason, grayscale cameras are used more often than color.  Infrared imaging may be considered vs. visible spectrum.   
  • Preprocessing: Consists of image sharpening, normalization, filtering and binarization. Sharpening is used quite rarely, because it increases noise, but it gives better results at unfocused images. Normalization (often called contrast-stretching) is the process of expanding the dynamic range of the image and pixel intensities.  Normalization yields better results at the binarization stage, when an image of up to 256 gray levels is converted to a pure black and white image. Filtering can be very important for lighting compensation. 
  • Character segmentation: Consists of two steps. In the first step individual strings are located and their correct order according to the position in the image is found.  This uses a digital image processing method called "blob analysis". A blob (binary large object) is an area of touching pixels with the same logical state. All pixels in an image that belong to a blob are in a foreground state. All other pixels are in a background state. In a binary image, pixels in the background have values equal to zero while every nonzero pixel is part of a binary object.  Blob analysis is used to detect blobs in an image, which in the case of OCR would be a string.  In the second step every string located in the previous step is segmented into individual characters. There is a special focus on broken and connected characters.
  • Character recognition: The segmented characters are compared to a reference set of pre-defined characters.  Classification is preformed according to the calculated distances between recognizing character and all the reference characters. If distances do not correspond to some predefined conditions, recognizing character is marked as unrecognized.
  • Postprocessing:  Involves the evaluation of the identified string(s) of recognized characters in the given application.  For instance, an image may contain several recognized strings, but only one may be of importance for a sorting or tracking task.  This string could be identified by the number of characters in that string, the starting character of the string or the location of the string.  

Probably the most common application of OCR in the laboratory involves scanning documents, including laboratory notebooks.  The use of OCR technology for automated sample processing is not highly prevalent, given the widespread use of bar code technology.  It finds more use in clinical laboratories, where there may be human-readable information on a label that needs to be automatically identified, such as with specimen containers.


Biometrics is the study of methods for uniquely recognizing humans based upon one or more intrinsic physical or behavioral traits.  Biometrics technology can be used for either identification or authentication purposes. In general, biometric identifiers can acquire unique biological information from people for the purpose of verifying identity.  Techniques include:

  • Fingerprint identification: The most commonly known method of biometric identification.  Fingerprint ridges are formed in the womb; you have fingerprints by the fourth month of fetal development. Once formed, fingerprint ridges are like a picture on the surface of a balloon. As the person ages, the fingers get do get larger. However, the relationship between the ridges stays the same, just like the picture on a balloon is still recognizable as the balloon is inflated.
  • Hand geometry: The measurement and comparison of the different physical characteristics of the hand. Although hand geometry does not have the same degree of permanence or individuality as some other characteristics, it is still a popular means of biometric authentication.
  • Palm Vein Authentication: This system uses an infrared beam to penetrate the users hand as it is waved over the system; the veins within the palm of the user are returned as black lines. Palm vein authentication has a high level of authentication accuracy due to the complexity of vein patterns of the palm. Because the palm vein patterns are internal to the body, this would be a difficult system to counterfeit. Also, the system is contactless and therefore hygienic for use in public areas.
  • Retina scan: A retina scan provides an analysis of the capillary blood vessels located in the back of the eye; the pattern remains the same throughout life. A scan uses a low-intensity light to take an image of the pattern formed by the blood vessels. Retina scans were first suggested in the 1930's.
  • Iris scan: An iris scan provides an analysis of the rings, furrows and freckles in the colored ring that surrounds the pupil of the eye. More than 200 points are used for comparison. Iris scans were proposed in 1936, but it was not until the early 1990's that algorithms for iris recognition were created (and patented). All current iris recognition systems use these basic patents, held by Iridian Technologies. One of the biggest problems with the technology is the variability of the iris, which changes characteristics depending on whether one has been drinking or taking drugs, whether the person is pregnant, and with the variabilities of age in general.
  • Face recognition: Facial characteristics (the size and shape of facial characteristics, and their relationship to each other). Although this method is the one that human beings have always used with each other, it is not easy to automate. Typically, this method uses relative distances between common landmarks on the face to generate a unique "faceprint."
  • Signature: Although the way you sign your name does change over time, and can be consciously changed to some extent, it provides a basic means of identification.
  • Voice analysis: The analysis of the pitch, tone, cadence and frequency of a person's voice.
  • DNA identification: A popular and increasingly non-controversial use of biometric technology.

The most common use of biometrics in the laboratory environment is requiring fingerprint identification for logging onto secure data systems.  The George Washington University Medical Faculty Associates (MFA) in Washington, D.C.has implemented such a system for access to patient records.  GE Healthcare has demonstrated the use of facial recognition for logging in to a hosptial computer workstation.  Commercial software[8] for facial recognition logon is available for PC use.    

Related Articles

External Links

  • The Association for Automatic Identification and Mobility: AIM Global is the international trade association representing automatic identification and mobility technology solution providers. Through the years, industry leaders continue to work within AIM to promote the adoption of emerging technologies.  AIM Global actively supports the development of AIM standards through its own Technical Symbology Committee (TSC), Global Standards Advisory Groups, and RFID Experts Group (REG), as well as through participation at the industry, national (ANSI) and international (ISO) levels.
  • Registry of USG Recommended Biometric Standards: The Registry of USG Recommended Biometric Standards (Registry) supplements the NSTC Policy for Enabling the Development, Adoption and Use of Biometric Standards.
  • is the central source of information on biometrics-related activities of the Federal government.
  • The Biometric Consortium: Serves as a focal point for research, development, testing, evaluation, and application of biometric-based personal identification/verification technology. The Biometric Consortium organizes a premier biometrics conference every fall. Information about past conferences, current government and standards activity, a bulletin board service, and other biometric resources can be found throughout this web site. 
  • ISO/IEC JTC1/SC31 Automatic Identification and Data Capture Techniques: ISO Information Technology Standards


  1. Bar Code Bootcamp
  2. Label Ease - Pre-labeled Labware Computype
  3. Matrix 2D Barcoded Storage TubesThermo Scientific
  4. Laser Etched Labware
  5. Marking power train components Industrial Laser Solutions
  6. Pinstamp marking systemTelesis
  7. Vcode print and apply Velocity11
  8. FastAccess BiometricsSensible Vision
Click [+] for other articles on  The Market Place for Lab Automation & Screening  Automated Identification, Data Capture & Tracking