Whitepaper
Seethalakshmi Subramanian, Technology Specialist • Customer Oriented – Enterprise Architecture
Manual code conversion for data modernization is often time-consuming and prone to errors. The DBShift™ platform automates the conversion of Informatica BDM mappings and workflows into Databricks PySpark code, significantly accelerating the migration process. Through this POC, DBShift™ demonstrated high accuracy (95%), reduced manual intervention, and faster turnaround—delivering a scalable and efficient modernization solution.
Introduction
In today’s fast-paced data landscape, organizations are constantly looking for ways to modernize their platforms and migrate workloads efficiently. Manual code conversion is often time-consuming and error prone. This is where DBShift™ comes into play.
As part of a recent Proof of Concept (POC), we explored the capabilities of DBShift™ in automating Informatica BDM Mappings and Workflows → Databricks code conversions.
POC Objective and Scope
The primary goals of this POC were:
- Demonstrate DBShift™’ s capabilities and reliance in accelerated modernization of Informatica BDM mappings to Databricks PySpark pipelines
- Minimize manual intervention and errors.
- Validate the accuracy, performance, and efficiency of the converted code.
- As part of the POC Scope, a total of 41 Informatica mappings, of varying complexity, were identified from 3 different business units.
Approach Taken
- Source Analysis – Gathered mappings, workflows and objects from Informatica BDM for 3 different Business units and analysed their Source XMLs to understand its various attributes
- Conversion via DBShift™ – Applied Gen AI Powered DBShift™ to generate equivalent Databricks code for the respective mappings and workflows are orchestrated in Databricks.
- Validation – Compared outputs between source and converted code for correctness for multiple iterations.
- Iterations – Two iterations of conversion were performed to fine-tune the backend LLM until the expected output was achieved.
- Optimization – Ensured target code followed best practices and performance standards.
Technical Process Flow

Journey To Modernization
The below image presents a structured timeline of the POC, showing each key phase in the journey from initial scope identification to readiness for SIT (System Integration Testing).
Process Phases & Descriptions
- Scope Identification: Began with identifying the scope, including 41 Informatica BDM mappings to be modernized.
- Conversion & Validation (Run 1): The first run involved model conversion and manual validation, yielding a model accuracy of 85%. Out of 39 mappings, 17 are fully complete, 22 partially converted, and 2 excluded.
- Fine-Tuning: Involved 16 hours of model fine-tuning and training to address uncovered scenarios.
- Conversion & Validation (Run 2): The rerun resulted in a higher model accuracy of 95%. Of 22 rerun mappings, 17 are complete, and 5 partial.
- Manual Fixes: 5 manual fixes were required, which took about 24 hours to implement.
- Ready for SIT: The process concluded with 39 mappings converted and ready, now compatible with Databricks PySpark.
Timeline & Key Stats
- Conversion Time: 6 hours for the main conversion phase.
- Total Conversion Effort: 46 hours (~1.2 hours per object).
- Validation Time/Effort: 140 hours (~4 hours per object).
- Fixes: 24 hours to address remaining issues.
Additional Highlights
- Manual and Automated Work: Steps clearly distinguish between automated runs and manual interventions.
- Legend: Blue icons represent Systech activities; green icons represent client’s manual validation.
This visual summary concisely documents critical project phases, quantitative outcomes, and task ownership in the modernization process.

Augmented Intelligence:
Model Fine-Tuning (16 Hours)
Model fine-tuning was central to improving conversion accuracy and addressing scenarios missed in the initial run. This process spanned 16 hours and involved systematically augmenting the conversion model with new logic derived from validation insights.
Scenarios Augmented to the Model
a) Unconnected Lookups
Initially, lookups that were not joined to the target were missed during the first run. These were identified and the model was refined so that, in Run 2, unconnected lookups were correctly resolved.
b) Pre/Post SQL Queries
Source qualifier queries (pre and post SQL) did not transfer correctly after Run 1. This gap was fixed during fine-tuning, ensuring their inclusion by the second run.
c) Session Properties
Session-level properties were not captured by the initial conversion model. Upon cross-verification with the target script, this omission was detected and later rectified within the model training process.
d) Mapplet (Primary Key Generation)
The function for generating primary keys in mapplets failed to operate properly at first. Fine-tuning involved revising model logic so that the corrected primary key generation logic was incorporated and validated successfully in Run 2.
These refinements enhanced the model’s reliability and output quality, strengthening the automated conversion pipeline and minimizing the need for manual interventions in subsequent stages.
Human Interventions in the Loop
Human interventions like these are critical to catch edge cases, correct automation misses, and adapt complex business logic during modernization, ensuring robust and reliable migration outcomes. Here, is a summary of necessary human interventions in the modernization loop:
- Validation of Output
Manual validation is performed to ensure the conversion results match expected outputs, business rules, and data correctness for each mapping and workflow.
- Manual Fixes
- Column Ambiguity: When column aliases are mismatched in complex mappings or when specific columns must be selected as data moves between transformations, manual correction is needed to resolve ambiguities and align with target structures.
- Workflow Tasks (Start/End Task, Cluster Creation/Deletion): Tasks that are handled automatically in Informatica (like workflow initiation, completion, and managing clusters) must be manually set up and managed in Databricks due to lack of automated support.
- Lookup Transformation: For “Any Value” scenarios in lookups, manual intervention is required to handle duplicate records. In Databricks, the row_number() window function is used to select appropriate records, a step that could potentially be automated in the future.
- Mapplets with High Number of Transformations: When a mapplet includes many transformations, it must be partitioned and code generated by segment. The segments are then merged manually into a single script in Databricks for validation—a step that can eventually be augmented.
Results & Metrics
The POC outcomes clearly demonstrate DBShift™’s effectiveness in accelerating code modernization through automation while maintaining accuracy and quality.
| Metric | Outcome |
| Total Mappings Evaluated | 41 |
| Mappings Successfully Converted | 39 |
| Model Accuracy | 85% (Run 1) to 95% (Run 2) |
| Automated Conversion Time | 6 hours |
| Total Conversion Effort | 46 hours |
| Validation Effort | 140 hours |
| Manual Fixes Effort | 24 hours |
| Automation vs. Manual Effort | 75% automation and 25% manual intervention |
| Overall Time Savings | 60% reduction compared to manual migration |
| Quality Outcome | Consistent, production-ready Databricks PySpark code with high maintainability. |
Lessons Learned
While GenAI-driven automation achieved remarkable efficiency, human validation remained a crucial component in ensuring reliability and business rule alignment. Key takeaways include:
– AI + Human Synergy: The combination of DBShift™’s automated conversion and SME validation led to higher accuracy and fewer rework cycles.
– Model Adaptability: Fine-tuning based on validation feedback improved accuracy from 85% to 95%, demonstrating continuous model learning.
– Manual Oversight for Edge Cases: Complex mapplets, pre/post SQL handling, and lookup scenarios benefited from targeted human intervention to ensure correctness.
– Progressive Automation: Each feedback loop contributed to expanding the model’s coverage, reducing manual dependency over time.
Together, these insights highlight the importance of a “human-in-the-loop” approach in achieving dependable, production-grade AI automation.
Outlook / Next steps
The success of this POC positions DBShift™ as a strong enabler for enterprise-scale modernization. Moving forward:
– Enterprise Rollout: Apply DBShift™ to larger data estates across multiple business units for accelerated cloud migration.
– Model Expansion: Extend AI learning to cover additional Informatica transformations and target environments beyond Databricks.
– Enhanced Automation: Further reduce manual fixes by training on edge cases and incorporating additional validation layers.
– Integration with CI/CD: Embed DBShift™-generated code into automated DevOps pipelines for continuous delivery and governance.
With each iteration, DBShift™’s automation and accuracy are expected to improve, providing organizations with a sustainable, intelligent pathway to cloud modernization.
Conclusion
The POC with DBShift™ clearly demonstrated the potential of GenAI-powered automation in accelerating the modernization journey from Informatica BDM to Databricks PySpark. With two iterative conversion runs, fine-tuning, and targeted human interventions, the overall accuracy reached 95%, significantly reducing manual effort while ensuring high-quality, production-ready code. Although certain edge cases still required manual handling, the learnings from this exercise highlight DBShift™’s ability to continuously improve through augmented intelligence. Ultimately, this POC validated DBShift™ as a reliable enabler for large-scale migration programs—delivering faster turnaround, minimized risk, and a sustainable path for organizations looking to modernize their data platforms with confidence.
Related Resources:
Empower Your Team with WizarD™
Boost productivity and efficiency with WizarD™, the GenAI-powered multi-agent system designed to collaborate seamlessly with your team. Transform the way you work—smarter, faster, better.
Intelligence Anywhere, Anytime
Seamlessly connect data, insights, and actions with our all-in-one cloud platform for Decision Intelligence. Empower smarter decisions, faster.
WizarD™ VisionPro AI
Revolutionize safety and efficiency with AI-powered image and video analysis. Enhance compliance, streamline operations, and safeguard your manufacturing environments with intelligent visual insights.