With the build-up of websites from the 1990s on, many large repositories of unstructured text were created in HTML.
Now those same repositories are getting increasingly difficult to index and catalog for search and retrieval.
The answer is XML, but extracting and transforming vast quantities of text from HTML (a layout/markup language) to XML (a data description language) is a daunting task. Beyond the prettyprint and tidy utilities is a more challenging task to create the correct XML structure and tags for the data.
Fortunately, Mobilize.Net has found a way to automate the process.
The Mobilize.Net solution is highly automated through file crawlers, HTML parsers, and our tried and tested migration engine. The Mobilize.Net migration is customizable which enables creating mappings that reflect your specific XML schema.
In addition we preserve all of the embedded files including images, audio, video and more.
Why Move to XML?
- Simplify Data Sharing. Computer systems and databases contain data in incompatible formats. XML data is stored in plain text format, providing a software- and hardware-independent way of storing data. This makes it much easier to create data that can be read by different, incompatible applications.
- More Manageable. Upgrading to new systems (hardware and software platforms), is time consuming. Large amounts of data must be converted and incompatible data is often lost. XML data is stored in text format, making it easy to expand and upgrade to new operating systems, applications and browsers without losing data.
- Make Your Data Available to Anyone, Anywhere. Different applications can access your data, not only in HTML pages, but also from XML data sources. With XML your data can be available to all kinds ofhandheld computers, voice machines, news feeds and more -- and make it more available for people with disabilities.
Here's what you get:
- All HTML-tagged text converted to XML; layout-only information is discarded.
- New XML tags represent the nature of the data.
- Correct XML syntax throughout.
- Easy customizations to reflect your data schemas and structural changes where needed.
- Full turn-key solution: you provide the HTML files and we return XML files for your review and acceptance.