I am a Microsoft employee, but the views expressed here are mine and not those of my employer.
I have been working with some clients recently on proof-of-concepts (POC) to recommend products to their customers. I have found that generative AI techniques can be very effective and in this post, I will show you how to build a simple Gen AI Product Recommendation System that you can use to quickly develop a proof-of-concept (POC) for your clients.
The idea behind this post is to just use Python, a CSV file (containing customer and product performance) and Azure Open AI to build a simple Gen AI Product Recommendation System 😃
Again this process is just for quickly POCing the concept and will be part one of a series of posts I will be writing on this topic. Which eventually will include more advanced techniques and technologies and lead to a production-ready system using techniques like Retrieval Augmented Generation (RAG) and more 😅
TLDR
If you just want to see the code, you can find it here
All you need is a CSV file with customer and product performance data and an Azure Open AI account. Then you can use the python code above to POC the concept. A sample CSV file has been included in the GitHub repo.
So as long as you can get customer and product performance out of your system, you can POC a Gen AI Product Recommendation System 👍
Prerequisites
There are only two prerequisites to follow along with this post;
- An Azure subscription and access to Azure Open AI. For the Azure Open AI component I use the text-embedding-ada-002 model to vectorise the CSV file (more details on this later) and the question. I also use the GPT-4o model to recommend the next best product to the customer. You can sign up for Azure Open AI here
📖 Note: You do not have to use Azure Open AI to make this work. You can sub in other AI models to perform the same functions as I have done in this post.
- A CSV file with customer and product performance data. I have included a sample CSV file in the GitHub repo for you to use and will go into more detail about the CSV file below.
Introduction
The python code found here is a simple Gen AI Product Recommendation System that uses generative AI techniques to recommend products to customers. The system is designed for rapid proof-of-concept (POC) development and can be used to quickly develop a POC for your clients. The code flows as follows;
- Prepare the data
- Vectorise the CSV file using Azure Open AI
- Vectorise your target customers profile and return similar customers
- Refine the list of similar customers with customers that have good product performance
- Recommend the next best product to the customer using Azure Open AI
Prepare the data
The first step is to prepare the data. I used a combination of ChatGPT and RAND functions in excel to create sample data. You should use your own data for this step. The data should be in a CSV file with similar columns to the below;
Company Name, State, Industry, Segment, Product 1 Name, Product 1 Performance, Product 2 Name, Product 2 Performance, Product 3 Name, Product 3 Performance, Product 4 Name,Product 4 Performance, Product 5 Name, Product 5 Performance, Product 6 Name, Product 6 Performance, Product 7 Name, Product 7 Performance, Product 8 Name, Product 8 Performance
In my example, I am recommending products to companies. You can see with headers like Company Name, State, Industry, Segment I am tring to capture the profiles of companies. You can change these headers to suit your needs. I.e. if you are recommending products to individuals you may remove the Industry column an replace it with something else.
I chose to have 8 products in my example to make it more interesting. You might have more or less products, which is ok and the concept will still work.
The performance column is a number between 0 and 100. 0 being the worst performing product and 100 being the best performing product. You can change this to suit your needs.
📖 Note: Your performance column could be a dollar value or a percentage or range from -100 to 10,000. It’s completely up to you.
Vectorise the CSV file using Azure Open AI
Now we have our CSV file, we need to vectorise it. This is where Azure Open AI comes in. I use the text-embedding-ada-002 model to vectorise the CSV file. This model is a transformer-based model that can convert text into vectors.
I have being using the word vectorise pretty liberally in this post. What I really mean is to create embeddings of the text. If you are interested in digging into this deeper, please follow this link here.
📖 Note: I will not be covering chunking strategies in this post. But you can read more about them here. In my experience when working with CSV files, you want a single customers data on one line. As this will make the chunking strategy easier, as it becomes one embedding per line.
The generate_embeddings function has been taken from this sample here, which I found super useful to kick start this process.
If you are following along with th example, your DataFrame should now look like the below; if you uncomment the print statement at line 48 and run the code.
|
|
As you can see above, we have a a couple of new columns; comb, n-tokens and ada_v2.
The comb column is a combination of the all of the columns in the CSV, which is defined in the cols variable in line 22 of the Python code. This is the data we will use to find similar companies to our target company.
📖 Note: I have included all of the column data in comb column, which you may not what to do. This is the data that will been turned into our embeddings and used to find similar company profiles. Feel free to experiment and play around which which columns make sense to define as company profile data.
The n-tokens column contains the number of tokens in the comb column. This is part of the sample code here which I thought would be good to include, so we can see hoe many tokens are in each company profile.
The ada-v2 column contains the embeddings of the text in the comb column. This is what we will use to find similar companies to our target company.
Vectorise your target customers profile and return similar customers
Now we have our CSV file vectorised, we can vectorise our target companies profile and return similar companies. The search_docs function does this. It takes the target companies profile and the vectorised CSV file and returns the most similar companies to the target company.
Again the *search_docs function has been taken from this sample here, which I found super useful to kick start this process.
One thing I did change was the number of similar companies returned. I found that 4 wasn’t enough to demonstrate the concept. So I increased it to 20. You can change this to suit your needs.
Our target company profile is Madeup Inc NSW Finance Banking Widget W 85 Widget Y 74
. So to expand on this, the name of the company is Madeup Inc, they are located in NSW, their industry is Finance, their segment is Banking and they have purchased Widget W and Widget Y with performance scores of 85 and 74 respectively.
If you are following along with the example and run the code, we have created a DataFrame called res and it should look like the below; if you uncomment the print statement at line 79 and run the code.
|
|
As you can see above, we now have a DataFrame called res which contains 20 similar companies to our target company. The similarities column contains the cosine similarity between the target company and the similar companies. The higher the cosine similarity, the more similar the companies are.
Refine the list of similar customers with customers that have good product performance
Next I want to refine this list to only include companies that have a similarities over 0.87, as I only want to find companies that are very similar to the target customer. So in you uncomment the print statement at line 85 and run the code, you will see the DataFrame res has been refined to only include companies with a similarities over 0.87.
|
|
As you can see above, we are left with 8 company profiles. 6 of these companies are in the same Industry as the target company and 3 are even in same segment. Also 6 are in the same State.
Next we again want to refine this list to only include companies that have good product performance. So I have taken a rudimentary approach to giving each of these 8 companies a score based on their product performance, simply by taking the mean of their product performance. Then I have removed all companies with a score under 70. If you uncomment the print statement at line 103 and run the code, you will see the DataFrame res has been refined to only include companies with a product performance score over 70.
|
|
As you can see above, we are left with 4 company profiles. 3 of these companies are in the same Industry as the target company and all 4 are in the same State.
📖 Note: Again, not sure if 4 or 6 or 10 or 20 profiles will yield the best results. But for demonstration purposes, I am happy to stick with 4. You may choose a higher number is you like 😃
Recommend the next best product to the customer using Azure Open AI
Finally, we can recommend the next best product to the company using Azure Open AI. I use the GPT-4o model to recommend the next best product to the company.
The way I do this is I create a system prompt that has the following instructions;
|
|
I am using some good prompt engineering practises here, but the key thing is to include the company profiles in the prompt. This way the GPT-4o model can use the company profiles to recommend the next best product to the Madeup Inc. Also I am asking it to recommend only one product, as I am only interested in the next best product for Madeup Inc. If you would like to see all the products that the similar companies have bought, you can change this prompt to suit your needs.
And lastly, below is the recommendation that the GPT-4o model has made for Madeup Inc;
|
|
As you can see above, the GPT-4o model has recommended Widget Z to Madeup Inc. The reason being that Widget Z has consistently high performance in the banking segment, with performance scores such as 94 from Vertex Innovations. Adding Widget Z could bolster Madeup Inc’s product lineup and improve their overall offerings.
Conclusion
In this post, I have shown you how to build a simple Gen AI Product Recommendation System that you can use to quickly develop a proof-of-concept (POC) for your clients. The system uses generative AI techniques to recommend products to customers and is designed for rapid POC development. The system is built using Python, a CSV file (containing customer and product performance) and Azure Open AI. The system can be used to quickly develop a POC for your clients and can be easily adapted to suit your needs.
📖 Note: This is just a simple example to get you started. In future posts, I will be covering more advanced techniques and technologies to build a production-ready Gen AI Product Recommendation System. For example, this process stores all of the vectors and CSV data in memory. This is ok for POCs but certainly not Production or even MVPs. You would need to store this data in a vector database or search technology and use various different techniques to retrieve similar company profiles. I will cover this in a future post.
I hope you have found this post useful and that it has inspired you to build your own Gen AI Product Recommendation System. If you have any questions or comments, please feel free to leave them below. I would love to hear from you.