AI in Retail: Product matching and dealing with duplicate products

In the US the e-commerce sector was growing very fast before the pandemic. But after the pandemic struck, the sales were pushed even further. The U.S. consumers accelerated the e-commerce growth by two years by additionally contributing a whopping amount of $105 Billion in terms of revenue. The online sales have hit $791.70 Billion in 2020 which is 32.4% of growth in the previous year. [Source]

The majority of the stores were closed and online shopping became a necessity rather than ease. So, to efficiently manage products, e-commerce platforms need to take care of some risks and problems.

Consumers spent $211.5 billion during the second quarter on e-commerce, up 31.8% quarter over quarter, according to data published Tuesday by the U.S. Census Bureau.

Retail e-commerce sales – census bureau

Impact of COVID-19 in the retail industry

  1. The pandemic forced retailers to ramp up their online efforts and adopt new technologies because many shuttered their physical locations to help slow the spread of COVID-19.
  2. Merchants are using AI to provide better logistics support which in turn enables same-day delivery. During the pandemic, AI-enabled supply chain management has improved a lot.
  3. The COVID-19 pandemic changes consumer habits, forcing retailers to adapt quickly to the new change. Millions of people have started buying groceries and household items online.

AI is helping retailers to provide complete support and continuous flow of supplies during the pandemic situation but certain factors need to be resolved for effective online buying. Since COVID-19 has surged online shopping so these issues are of serious concern now.

Some major problems faced by online retail stores

  1. Quality issues and reliability: One of the most common problems people shop online is quality issues. While shopping online the person who doesn’t know about the quality of the product needs to depend on the product reviews for the same. Since the product review feedback isn’t much reliable, this is one of the major concerns for online retail stores.
  2. Delivery and logistics issues: This is one of the most challenging problems. Since the COVID-19 pandemic struck the delivery and logistics department of every major online retail store is suffering a great load.
  3. Inefficient catalog management: E-commerce catalog management is a dynamic process. Nowadays customers are very much attached to online shopping. Hence, every retailer needs to have an amazing catalog for a better customer experience. During the development, certain issues create a barrier to creating a good catalog.
What are the challenges involved in running an online store? - Quora

In this blog, we will discuss catalog management and what are the issues faced by online retail stores while maintaining the same. Further in the blog, we will discuss how AI is being utilized to solve these problems.

Risk of product duplication

It’s very hard to imagine the current problems which e-commerce companies face while maintaining their catalog. One such problem is identifying duplicate items and properly managing the duplicate items.

E-commerce giants like Amazon, Walmart, and Wayfair are consistently dealing with the risk of product duplication in the catalog. There are millions of products listed on each giant platform. Wayfair alone has over 9 million unique items in its catalog. The products listed on these websites come from different suppliers and it’s has been observed that these suppliers sell the same product with different names and different prices. Here arises the risk of product duplication. To provide the customers with an amazing experience we need to group all the duplicate products into a single entity.

Customizing online commerce is what customers expect. 74% of eCommerce companies already use personalization and that percentage is expected to increase to more than 90% by the end of 2021. If you do not meet or exceed your customers’ expectations, your customers may purchase. With the growing online eCommerce options, they have many other places to go. In 2019, only 3% of eCommerce website visits were converted into purchases. [Source]

AI powered solution to deal with the risk of product duplication

We will be taking different images as input. These images will be passed as an input to the EfficientNet model, a Convolutional Neural Networks (CNN). Then these models output an image embedding corresponding to a particular image. The advantage of using EfficientNet architecture is that it doesn’t require the scaling or normalization of the input image.

After getting the image embeddings, these image embeddings are used to identify similar images using the Nearest Neighbours technique.

What is EfficientNet ?

  • The EfficientNet architecture is based on CNN. It employs a scaling method that uses a compound coefficient to consistently scale all dimensions of a picture. The EfficientNet scaling method consistently increases network breadth, depth, and resolution with a set of preset scaling coefficients, unlike standard practice, which adjusts these factors randomly.
  • If we want to use 2^N times more computational resources, Then, we may simply increase the network depth by α^N, width by β^N, and image size by γ^N, where α,β,γ are constant coefficients obtained by a small grid search on the initial small model. EfficientNet uses a compound coefficient to scale network breadth, depth, and resolution equally and consistently.
Google AI Blog: EfficientNet: Improving Accuracy and Efficiency through  AutoML and Model Scaling
EfficientNet Architecture [Source]

Getting started with the implementation

Introduction to our use case

As we are well aware that the retail companies use a variety of approaches to avoid the risk of product duplication and to assure that their products are listed at a competitive price. One such method is the product matching method, which allows a company to offer similar products at a competitive cost. Machine/Deep Learning is extensively being used for product matching and to remove the risk of product duplication. We have a dataset that is provided by Shopee.

Writing the codes

Here is the link to the notebook.

  1. Importing all the important libraries that are required.

2. Now we will define the model configuration and the configuration files.

3. Loading the data. We have basically two modes namely ‘SUBMIT’ and ‘DEBUG’. ‘SUBMIT’ mode will be used to directly perform the predictions.

4. Creating the data generator to convert the dataset into the form that can be loaded into our model for training.

5. Loading the pre-trained EfficientNet model and then creating the image embedding of the input images.

6. Now, applying the Nearest Neighbours technique to the image embeddings.

7. Creating helper functions to perform the text predictions.

8. Using TF-IDF technique to convert the text into vectors

9. Calculating the pHash predictions is one of the most important step.

10. Combining text predictions and pHash predictions will serve as a way to predict the overall duplicate products (basically product matching).

11. This is the final step of our use case.

posting_id: the ID code for the posting.
matches: Space delimited list of all posting IDs that match this posting. Posts always self-match.

About the writer

Hey Guys, congratulation on making it to the end! I am Akshat Dubey and I work as a Data Science Intern at Labellerr. I am a fourth-year student pursuing an Integrated Master of Science in Mathematics and Computing from Birla Institute of Technology – Mesra, Ranchi. My course focuses on the implementation of mathematics in the field of artificial intelligence. I am a Kaggle Master well versed in Machine Learning, Deep Learning, Computer Vision, and Natural Language Processing. My primary interests involve the application of artificial intelligence in the field of healthcare and retail. To connect with me, you can click on the following links:

Connect with the Labellerr Team:

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *