The Definitive Guide to Artificial Intelligence-Based Automated Code Migration

Legacy applications represent both a significant capital investment for organizations, a valuable asset that delivers on-going value, and potentially an area of risk and vulnerability. Just as factories periodically need to replace or modernize old and decrepit machinery, IT-based systems decay over time and need to be dealt with. Automated migration is a cost-effective and efficient means to modernize many legacy applications, giving them new life and extending their value to the organization.

What is automated migration?

Automated migration is the process of using software tools--similar in concept and architecture to a compiler--to convert the original source code of an application to a more modern programming language, framework, platform, or all three.

How do GAP Migration's automated migration tools work?

GAP's tools are all based on the same theories and architecture that allow compilers to work. Oversimplifying, there are three fundamental pieces in all our tools:

Front end parser/typer: this component reads and analyzes the source code, finding all the references, parsing the syntax of the language, and understanding the types of all objects
Static code analyzer: this middle component builds abstract symbol trees that represent the static structure of the code, then runs interative analysis tools to build symbolic representations of the semantic intent of the application code.
Following a complete structural representation of the code, the back end emits new code in the form of source code files, folders, and appropriate helper classes (also in source code), along with project and solution files ready to be loaded into Visual Studio or other appropriate development system.

GAP's tools are componentized (see above). This makes it feasible to modify the basic tools to perform new transformations. Both input and output languages/platforms can be modified. In addition, the architecture of the tools allow for substantial refactoring of application code, such as what happens when using WebMAP to migrate from Windows desktop to native web architecture using Angular and ASP.NET Core.

Automated migration using GAP's tools rely on iterative semantic analysis. First, all the application code, including external components, is parsed and typed. This symbolic representation is analyzed using AI algorithms to build a semantic representation of the intent of the code. Prior attempts by other computer scientists to do rich code analysis were limited to syntactic analysis; GAP pioneered true semantic analysis of source code over two decades ago. GAP stands alone in its ability to perform automated migration that truly replicates the intent of the code. The success of the approach has been proven by the billions of lines of real-world code that have been migrated with our tools.

When does automated migration make sense?

Not every modernization project is suited for automation, but many are. Some of the characteristics of legacy applications that could benefit from automated migration include:

The application is in continual use providing on-going value to the organization
The application contains a large amount of business logic
The application is considered to be a critical asset to the organization, perhaps providing competitive benefit or strategic advantage
The source code is modified frequently to support changing requirements or regulations
Over the life of the application, errors in logic or business rules have been largely eliminated
The programming language is obsolete, making it hard to find developers familiar with it at competitive rates
The application is extremely large and complex (>100,000 lines of code)
The organization would like to be able to expose services in the application as RESTful APIs
The organization is an independent software vendor (ISV) who is experiencing new competition from market entrants with more modern UX, or with browser-based SaaS applications.

In cases where an application has many or all of these criteria, the organization is typically forced to choose between automated migration or a complete rewrite. See below for more information on the pros and cons of automated migration compared to manual coding.

How automated migration saves time and money

Automated migration can, in a single process, convert 85-95 percent of the source code to the new target language and platform. The remaining work can be done by the in-house development team, and outside systems integrator, or Mobilize's own engineering services team. Actual project data, collected over a 15 year period, confirms that automated migration typically saves as much as 75 percent of the time required to rewrite a legacy application. Cost savings are similar, averaging about 20-35% of the cost of a manual rewrite.

One large government organization undertook to compare the actual cost of rewriting a legacy application with using Mobilize to migrate a similar application. Analyzing the actual data collected showed the cost of the rewrite was 3x compared to the cost of using automated migration. Additionally, the automated migration project was delivered as a fixed-price, fixed-schedule project, eliminating risk.

Why automated migration beats a manual rewrite

The history of software development is littered with the bleached bones of failed projects. Methodologies abound to avoid issues with development, yet they continue to be not only common but almost the rule. Even as recently as 2017 only 29 percent of projects were successful. Even organizations implementing Agile development only meet their schedule goal 65 percent of the time and their budget 67 percent of the time. While this is demonstrably better than 29 percent, are you willing to bet on only a 2 out of 3 chance of reasonable success? Furthermore, as the size and complexity of a project increases, the risk of failure can go up by as much as 10x.

And yet many companies undertake to rewrite aging legacy applications rather than investigate automated migration as an alternative approach. Here are a few actual examples:

Medium ISV

This particular company had a VB6 application that was sold to vertical-market customers for years, leading to a strong market position. However, recent changes caused them to undertake a rewrite to create an HTML version (for, among other compelling reasons, to enable them to adopt a SaaS model). Over the course of almost two years, the company struggled to both keep the VB6 version up to date while building the HTML web version, constantly changing the new version's requirements to match the evolution of the VB6 version.

Goals

As is typical with these situations, the rewrite project included a number of ambitious goals:

Re-create the existing VB6 application using a new target language/platform--in this case, ASP.NET/MVC on the server side with a single-page application client using HTML5, CSS, and JavaScript. Client to server communication would use AJAX and JSON messaging.
Add in a prioritized list of feature enhancements that the product management team has been asking for
Re-architect the code base to take advantage of modern coding practices like loose-coupling, true object orientation, separation of concerns, naming and coding standards, better commenting, etc
Create a user experience that mirrors the existing desktop application so that users will not require retraining
Not introduce any defects in the functionality of the application, in effect re-creating all the business logic perfectly
Creating a test harness and full suite of automated functional tests to run future regressions
Add unit tests to all code
Host the platform on Azure as a hybrid application initially, with an eye to full public cloud deployment in future
Implement application performance monitoring and measurement tools post-deployment
Implement DevOps and CI/CD after migration and deployment.

A bridge too far

If you guessed this set of goals was overly-optimistic, you'd be correct. Some of the problems they ran into included:

The single development team was overwhelmed by trying to keep the VB6 product competitive while simultaneously working on the rewrite; as a result schedules slipped constantly.
The team was faced with a dizzying array of new technical skills, tools, and components they needed to learn, master, and employ correctly
Shifting requirements, resource bleed off, and weak project management as they attempted to implement Agile and scrum caused morale to deteriorate and a "death march" attitude among team members.

A better bridge

After over a year of struggle trying to get the project on track and keep it on track, they decided on a different approach.

Turning to Mobilize.Net, they were able to use automated migration to dramatically cut the time and cost of migrating their VB6 application to the web. Following verification that the migrated app was functionally and visually equivalent to the VB6 version--only now existing and deployed as a modern web application--they were able to quickly switch their licensing model from updates to subscription (SaaS), move their deployment to public cloud, and begin implementing DevOps and CI/CD.

But what about...?

Note that the automated migration did not move the needle on their goals to refactor the code into more modern patterns like full object orientation or DRY (don't repeat yourself). (Note: the automated process DID re-architect the code into MVC on the server and MVVM on the client.) But now the company had a modern code base--in production--which could be refactored and improved using modern tools, patterns, and affordable and findable developers right out of school. The team found that refactoring a lot of similar code into classes was more fun than missing dates and getting yelled at by management. Learn more>

Automated migration code quality

Code quality in automated migration projects is a major concern, as well it should be. Some approaches rely on a runtime layer that effectively translates the source platform syntax and runtime library API into the target. This means the resulting code to be maintained looks almost exactly like the original application's source code. Why is that bad? Because legacy languages are not being taught in computer science curricula, nor are emerging developers either schooled in or interested in learning those languages. Additionally, if the migration relies on a runtime, there is now a permanent 3rd party dependency--do you want to bet your company on something you can't control?

Refactoring after automated migration

There's no debate about whether legacy code needs refactoring. Older code rarely if ever shows modern patterns like true OOP, DRY, separation of concerns, loose coupling, and so on. Coding conventions, naming standards, and documentation/comments may be all over the map. Migration using automated migration tools can improve some of these areas, but not all. What automated migration CAN do is to get your app up and running on a new, modern code base using modern languages, platforms, frameworks, and tools so you can begin refactoring. Investing in refactoring AFTER migration makes far more economic sense than holding on to legacy code while refactoring BEFORE modernization. Modern languages and frameworks have built in support for constructs needed in your refactoring process--legacy languages and frameworks usually don't.

Choosing an automated migration tool or vendor

The process of syntactic translation of code from one language to another is relatively simple but does not represent a migration or modernization of the underlying application. To borrow from a famous movie, these are not the droids you are looking for.

Instead, anyone investigating tools to assist with legacy modernization should restrict their search to semantic transformation tools. Semantic transformation re-creates the application functionality in a new language with a new runtime environment. For example, converting VB.NET to C# is syntactic translation, while converting VB6 to .NET (with C#) is semantic transformation.

GAP only provides tools capable of semantic transformation, not syntatic translation. Just as a compiler analyzes high level source code and creates low-level, optimized machine level code, both VBUC and WebMAP use static code analysis to understand the intent of the code, then generate a correct representation of that intent in the target language, framework, and runtime environment.

With that as an introduction, when evaluating potential automated modernization tool solutions and vendors:
Ignore those who only provide syntactic translation, as this does little to solve the pressing problems of legacy code bases
Evaluate the output of the tool, ensuring that the generated code is readable, maintainable, and meets coding standards
Ensure the vendor's tooling can be modified or configured to implement naming, coding, error handling, and other standards.
Verify that the vendor's tooling can be modified or extended to handle unique or rare coding constructs, components, and dependencies to minimize manual effort post-migration.
Verify that the vendor's solution will create 100% functionally equivalence, meaning that any functional test the original application passes the migrated application will also pass. In fact, a full regression test suite for the source application is the perfect validation for the migrated application, once the runtime environment differences have been accounted for.
The migration should fully embrace the target language and runtime environment (such as .NET), rather than merely pasting legacy-looking code on top of a binary translation layer.
The presence of 3rd party binaries for runtime translation should be a red flag. 3rd party dependencies place the vendor in a position of power for all time over the customer, with the constant risk the vendor will cease support or even go out of business.
The vendor should have a verifiable track record of multiple successful migrations of applications similar in size, scope, complexity, and nature to the target application
The vendor should be able to perform a proof of concept on the target application's actual code, migrating some representative but small part of the code to the target using the expected tooling and be willing to discuss both the pros and cons of the generated results.
If the generated code contains helper classes to ease the transition for the development team to the new platform, they should be in available source code and should not limit or restrict the use, modification, or improvement of them with respect to the migrated application.

Approaches to automated migration

Let's review a few popular approaches to automated migration, including ours.

Workbench

The workbench could be described as "tool assisted migration," in that it provides some assistance to help you migrate code file by file. With the workbench, you look at each line of code separately, evaluating proposed changes shown by the tool and selecting which ones you want to implement.

Pros: Since you are migrating each file individually, when you are done the project should compile and run right away. Many workbenches offer a direct way to extend the built-in mappings, sort of like using regular expressions.

Cons: Since each line of code is modified directly by a developer, this approach doesn't lend itself to spreading the migration across teams, each of whom might migrate similar code in a different fashion. This approach can create a tower of Babel in the final code, unless the teams are particularly effective at manually implementing a variety of standard approaches. Also: this approach is highly granular, since each file is processed individually. It prevents the migration process from working from a representation of the entire application.

Runtime

Some approaches use a binary runtime that inserts itself between the source code and the OS. This approach lets the developer preserve a lot of the original syntactic flavor of the source language, while mapping virtually all of the behavior of the source libraries to the destination.

Pros: Migration is typically quick and easy. The resulting executable will run on the new platform with faithful mirroring of the original application.

Cons: Dependency on the vendor to maintain and support the runtime layer. If the runtime layer works, there are no problems. If the runtime layer has bugs or missing functionality, there is no workaround. If the vendor abandons the libraries (this has happened) or ceases business operations, there is no path forward. Further, this doesn't solve the problem of dependency on developers with legacy skills and knowledge, since it preserves most of the style and syntax of the source language. This approach also doesn't help companies who are struggling with compliance since the app will still require all of the old technologies (which are out of compliance - e.g. Windows XP, Visual Basic 6.0, etc.)

AI-assisted migration

The GAP way. With this approach, the entire application is analyzed and understood before any code is created in the target platform, in order to ensure the best possible code quality.

Pros: This is the fastest and most efficient method to move the legacy code base to a new language, runtime environment, and framework. Iterative static analysis allows for understanding of the whole application before any code is generated. Output is native code using correct syntax and conventions of the target framework. Any and all helper classes are C# source code. Business logic is preserved without introduction of new defects. Functional equivalence is assured. Lends itself to team migration to reduce overall time required to deliver final code. The most proven approach, since more code has been migrated with this system than all others combined. You wind up with readable, maintainable code with no external dependencies.

Cons: The process requires real work, but it's orders of magnitude faster than rewriting.

.NET & Web Migrations

Native Web & Cloud Migrations

Solutions

Technology Services

Transformation Services

Not Sure Where to Start?