Case Study: Using GenAI to Solve the Building Materials Matching Problem

The building materials industry has a problem that makes it very hard for technology to be adopted. Well, there are a few actually, but one of the main issues is that you cannot easily compare products from different suppliers, to understand whether they refer to the same product. As an engineer you may think “oh that’s easy, we can just compare the UPC”. Except these products do not have UPCs. They are made by different manufacturers, sold by different retailers, but the end result is - for someone working in construction - the same product.

Timber is probably the best example of this. Take these two products from different suppliers:

From the title you can extract a lot of information about the product:

Treated - The product has been treated to prevent rot.
C24 - A strength grade used to indicate it’s intended use in the UK and Europe.
Regularised - The product is planed to give a smooth and consistent finish.
Dimensions - One is 45mm x 145mm and the other is 47mm x 150mm.
Length - Both are 3.6m.

The thing is these actually both refer to the same product - even though the dimensions are different. As the products are planed to give a smooth finish, one supplier refers to the finished size, while the other refers to the unfinished size.

Unless you have industry knowledge, you aren’t going to know this. And this problem isn’t just for timber, it extends to many other categories in the industry. And that’s why it’s hard for technology to be adopted. It’s not something you can easily compare with an algorithm or even conventional machine learning.

In the past we actually tried this using rule-based ML. From the title and description of each product, and a whole bunch of hand-crafted rules specific to each category, you can do this somewhat. It worked ok - but still needed a team of people to verify the results, and make the final decision of whether two products are the same or not.

Generative AI

Generative AI (GenAI), such as used by ChatGPT, can significantly transform how we address this problem. Instead of relying on a rule-based matching approach, which is slow to build and needs a lot of refining for each category, we can leverage GenAI to build a more advanced and flexible solution.

Data Processing

In the rule-based ML approach, a lot of time is needed for data processing. For example, the seller’s product page may have dimensions listed as “3500mm”, “350cm” or “3.5m”. These all need to be standardised before they can be processed. The same needs to be done for all other industry jargon such as “regularised”, “planed”, “DAR”. This takes a lot of time and is prone to error. What if a seller accidentally uses the American version of the word “regularized”?

One of the key advantages of GenAI is that it can handle the data preprocessing for us. We no longer need to manually standardise terminology and format it. The GenAI model can recognise and standardise the terms automatically. This saves significant time and effort before we even get started training a model.

Fine-Tuning the GenAI Model

Out of the box, the GenAI model can do a pretty good job at identifying products due to the nature of it being trained on a diverse set of data. However, for the best results, we want to fine-tune the model on data that we have manually verified.

Compared to the traditional ML approach where we would need a vast set of data to train a model from scratch, which is both expensive and time-consuming, we can fine-tune a pre-trained model such as GPT-4. Fine-tuning involves adjusting the model's parameters using a smaller, domain-specific dataset to improve its performance on our specific matching problem. This allows the model to develop a deeper understanding of industry-specific jargon and the contextual nuances required for accurate product comparison.

Validation

To ensure the accuracy of the data that is being produced by the model, a validation step is essential. Feedback from this step is used to further fine-tune the model and improve its performance.

Compared to the rule-based ML approach, the validation process for GenAI models often yields more accurate results earlier in the training process. Rule-based systems require extensive handcrafting of rules and continuous updates to accommodate new product descriptions and terminologies. In contrast, GenAI models leverage their ability to understand context and language, resulting in more accurate initial matches. This reduces the time and cost associated with training and maintaining the model, as fewer iterations are needed to achieve high accuracy.

Key Advantages of Using GenAI for Product Comparison:

Scalability - GenAI can handle vast amounts of data with minimal human intervention, making it scalable across different product categories and suppliers.
Accuracy - The contextual understanding of GenAI leads to more accurate product matches, reducing the need for extensive manual verification.
Adaptability - GenAI models can quickly adapt to new terminology, product descriptions, and industry standards, unlike rigid rule-based systems.
Efficiency - Automation of the product comparison process frees up valuable time for professionals, allowing them to focus on more critical tasks.
Reduced Training Time and Cost - The ability to fine-tune pre-trained models significantly reduces the time and cost associated with training, compared to developing rule-based systems from scratch.
Continuous Improvement - An ongoing validation process ensures the model remains accurate and reliable, continuously learning and improving from new data and expert feedback.

Conclusion

Generative AI (GenAI) is a game-changer for solving the tricky problem of comparing products in the building materials industry. With GenAI, we can make the process of matching products from different suppliers much more accurate, efficient, and scalable. Thanks to its advanced capabilities, GenAI not only handles the heavy lifting of data processing but also keeps getting better with continuous validation and feedback.