Powering Data Ingestion with AI Models through Systech’s Dopplr Platform
The client is a multinational pharmaceutical company based in India. The company has a presence in over 100 countries and is one of the world’s largest specialty generic pharmaceutical company.
The client’s key business needs include ensuring the timely availability of drugs to patients, optimizing inventory levels and responding quickly to market changes. Additionally, the client requires data-driven decision making, an efficient supply chain to optimize costs and shorten lead times, and the ability to identify market trends and respond to competitive pressures.
The primary objective of this project was to enable client with a robust data & analytics platform(Dopplr™) which would help in building an effective data strategy and insights framework to address the key challenges faced. Two main use cases were identified – creating a centralized data layer and creating a scalable data ingestion framework using document processing AI models to extract data from various files & formats shared by distributors.
The client faced several challenges in consolidating data from primary and secondary sales, OCR processing of PDFs, and entity recognition of Excel files. The lack of a unified data model and enterprise data hub also added to the complexity.
The client faced several challenges related to the limited visibility on distributor sales performance and inventory levels. Manual data collection from multiple distributors and secondary sales data provided by distributors in multiple formats added to the complexity. The lack of data harmonization processes to collate these elements made it challenging to get a holistic view. There was a huge effort in manual collation, consolidation, and quality check of data. The lack of capabilities to detect anomalies in the data provided by distributors was another issue. Furthermore, there was a lack of unified data management framework to federate Primary & Secondary sales data to perform efficient stock movements across distributors. The client also lacked insights generation from the data obtained from 3rd party aggregator.
The overall approach involved applying various techniques to extract and process information from structured and unstructured data sources. Structured data sources were processed from ERP & aggregator, and data integration routines were developed to host the processed data in the enterprise data hub/lake house. For unstructured data sources, PDF files from distributors were processed using OCR techniques, and entity recognition was performed to label products, customer names, distributor names, etc. Excel files were processed by performing a data scrapping approach to extract the information in each cell and then perform entity recognition process to generate key value pairs and store in the data model.
The approach involved building a centralized enterprise data hub comprising of data from 3rd party aggregators, in-house ERP, and distributor invoices, enabling a slew of analytics. A sales analytics dashboard was created with insights across primary sales and secondary sales data and stock movement analysis. The platform capability to deploy additional dashboards, self-service analytics & predictive models was also provided.
The client was able to address their key challenges and enable effective stock management through the data-driven insights provided by the Dopplr™ platform. The data ingestion framework allowed for automated data pipelines feeding data into the enterprise data hub. The process framework for onboarding new distributors automatically was also implemented. The sales analytics dashboard provided insights across primary sales and secondary sales data and stock movement analysis. The platform capability to deploy additional dashboards, self-service analytics & predictive models enabled client to continue to scale their data-driven decision-making capabilities.