“If a consumer logs into an e-commerce website and starts looking for clothing items, for example, the website will return a range of the results, some are better than others, ”said Andrea Mirabile, senior manager, computer vision, Zebra Technologies, who headed the Zebra team.
Sponsored by Alibaba Group and Trax, the CVPR 2022 Challenge: Large-scale Cross-Modal Product Retrieval brings together 650 research teams from around the world to work on a large multimodal retail dataset of nearly five million image caption pair on circa 100,000 products. Each team is tasked with finding top-K product candidates that fit a question such as “blue men’s turtleneck sweater.”
“From the consumer’s point of view, it’s a very unpleasant experience, wasted time and effort, which could result in any order, ordering the wrong item, or having to look elsewhere,” he added. Mirabile. “From the seller’s perspective, it’s about the problem of how to use words, images, and search functions that match what consumers are looking for when they shop and can make more relevant product recommendations. “
Top-K is the most common measure of performance in machine learning and computer vision. In con- text of limited and noise data, such as retail data with a mixed range of image quality and text cap- tion, the use of the loss function (the function that calculates the distance between the current output of the a algorithm and the expected output) designed for the top-K classification can bring significant im-provement.
“AI applications consist of speech, text, sound, and vision. Our challenge team and the broader Zebra global AI research team put them together, in the same way that a person uses their five senses to feel and analyze the world around them to inform decision -making and action, ”Mirabile explained.