Sharath Pankanti received the Ph.D. degree in computer science from Michigan State University, East Lansing, in 1995. He joined IBM T. J. Watson Research Center, Yorktown Heights, NY, in 1995 and was with the IBM Advanced Identification Project until 1999. During 2000–2001, he worked on “footprints”—a system for tracking people based on their infrared emission. From 2001 to 2003, he worked on PeopleVision, a system for detecting and tracking individuals in indoor and outdoor environments. From 2003 to 2004, he worked on large-scale biometric indexing systems and since 2005, has worked on object recognition and human interface designs for effective security and convenience. He has co-edited a comprehensive book on biometrics Biometrics: Personal Identification (Kluwer, 1999) and coauthored A Guide to Biometrics (Springer, 2004).
Self-checkout systems are perceived as the future of retail checkout and are emerging as attractive business solutions that empower both retailers and consumers alike. The self checkout systems allow a shopper to checkout (e.g., purchase) products from a physical store with as little assistance from store staff. The self checkout systems need to validate the shopper item selection and accept appropriate payment for the transaction. Some simple self checkout systems such as bank ATMs, gas pumps, and airline kiosks are already very successful. Before self-checkout becomes ubiquitous for all point-of-sale applications, following three fundamental, challenging (and often conflicting) problems need to be overcome: (i) cost: the system must be reasonably inexpensive to build/install and should work with as much of the existing equipment as possible; (ii) Security: the system must effective against theft (small false miss rate) without annoying honest customers (small false alarm rate); and (iii) Usability: a usable (e.g. higher throughput) system must not unduly inconvenience the user and owner (e.g., retailer) In other words, inexpensive self-checkout lanes that are more accurate, easier to use, and faster will provide a better shopping experience.
The conventional automatic self checkout systems at retail stores are wanting in performance as defined by the three metrics defined above. The cost of the typical self checkout systems is significantly higher than the cashiered checkouts because the sensor instrumentation involves customized fabrication. The accuracy performance of the system may not be acceptable since the sensor measurements used for verification are impoverished in discriminatory information (e.g., item weight). Finally, the conventional self checkout technology has a very limited view of user interaction and, is therefore, not very user-friendly.
The new generations of self checkout systems are increasingly considering camera-based "video analytic" technology because they help satisfy all of these concerns. Driven largely by the cell phone and consumer photography market, inexpensive high-quality cameras are becoming commodity items. Coupled with the increasingly powerful CPUs found in point-of-sales, minimal additional resources are needed to run such systems. Moreover the visual information that can be obtained from cameras is much richer than that provided by other sensors, thus allowing better detection of fraud. Finally, cameras are relatively unobtrusive to the customer, and also provide new avenues for further augmenting usability. Designing camera-based self checkout system is a challenging computer vision research problem because there are tens of thousands of items in a store. Moreover, there is a wide variety of different forms, colors, shapes, and sizes that must be accounted for. Furthermore, because of a variety of different illumination conditions, learning invariant visual features of the shopping items is also very complex.
In this talk, we will present our research on computer vision based self checkout system design which is completely automatic in operation ranging from image capture, object segmentation, training/learning, and matching. Based on the real data involving thousand of shopping items collected over extended periods of time (more than 20 months), our experimental results demonstrate that visual technology is an effective and inexpensive component of design of next generation self checkout systems. Here are some of the specific results we will elucidate in the presentation:
(i) Cost: the estimated cost of the visually augmentation is relatively inexpensive and would afford removal of some of the existing sensor-based subsystems without affecting the accuracy performance. The estimated cost and resource requirements the self checkout system based exclusively on vision sensors is very attractive.
(ii) Accuracy: In several technology tests spanning more than a billion matches, we show that false positive rates (fraction of times one item is mistaken for the other) and false reject rates (fraction of time an item fails to match another image of the same item) of the visually augmented system is significantly better than its conventional counterpart. Our results demonstrate that the new visually augmented self checkout system is at least twice as accurate and twice as shopper-friendly as the existing technology. We also show that the exclusively vision-based self checkout system can also offer acceptable accuracy performance. We show that the statistical feature matcher performs significantly better in our design than it structural counterpart.
(iii) Usability: The results also quantify how the new technology will significantly improve shopper assistance, lane throughput, and shopper queue lengths. Further, we demonstrate that the visual system can be effectively trained from a very few samples arbitrarily selected from the shopping data. This “on-the-fly learning” design feature offers significant advantage since the manually training is impractical in a real store where there are tens of thousands items (many of which are changing their appearance on a weekly basis.
In summary, we conclude that the visual appearance of items is rich in information, that we can reliably extract this information, and that it is sufficiently distinctive to yield real-life practical general purpose vision system with acceptable item verification performance.