SOA Case Study
How R.L. Polk Revved Its Data Engine
In the fall of 2004, Kevin Vasconi, chief information officer of R.L. Polk & Co., met with other top executives of the company. It was a state-of -- the company meeting to discuss Polk's strategy to enter the future. The consensus was that its information systems would not be able to support the business into the next decade. "If you have that discussion honestly," Vasconi says, "it will scare the crap out of you."
The Southfield, Michigan-based company's business is automobile data. The company compiles vehicle registration and sales data from 260 sources that include motor vehicle departments in the United States and Canada, insurance companies, automakers and lending institutions. Polk then repackages that data and sells it to dealers, manufacturers and marketing firms -- anyone who wants detailed information about car-buying trends, such as the top-selling SUV for a particular ZIP code. For years, Polk's process of consolidating data ran on IBM mainframes that queued data before they were processed in order to maximize mainframe resources. This data was processed in daily or weekly batches, which slowed down the time that customers could receive it. Polk's entire database already compromised more than 1.5 petabytes or 1.5 quadrillion pieces of data.
Customers had been anxious to get sales data more quickly. Paul C. Taylor, chief economist for the National Automobile Dealers Association says that Polk's vehicle registration data by state is typically available 30 days after car makers release their national sales data. That prevents dealers from one state from immediately comparing trends in their area with nationwide trends and adjusting their inventories accordingly. "Ideally, you would have the state breakdown when you have the national sales figures," Taylor says, "But if Polk could shave off even a week from the cycle, that would be a vast improvement."
Actually, Polk had tried twice before to move off the mainframe, but those projects ended up being scaled back. "It's the mother of all databases for automotive intelligence," says Joe Walker, president of Polk Global Automotive, the division of that sells data to businesses. "It seemed too daunting a task to try to move it."
After Vasconi joined the company, the company executives took a different tack with a project code-named ReFuel. In late 2004, Polk created a new company, called RLP Technologies, to build the next data aggregation system. The subsidiary is 7 miles from the main campus at a building in neighboring Farmington, Michigan. It has a full-time staff of 30, and at the peak of development employed 130 contractors, including consultants from Capgemini.
Vasconi's first task was to figure out what the new system would look like. Polk had three high-level objectives, referred to in shorthand as "50/50/100"; The new system need to be 50% more efficient; deliver data 50% more quickly; and aim for 100% data accuracy. Dubbed the Data Factory, the new system performs the same three jobs that the IBM mainframe did. Vasconi knew the system should have a service-oriented architecture, or SOA, which allows software components in different systems to communicate in a standard way. That was so he would have the flexibility to add or change pieces without disrupting the whole system. In addition, Vasconi wanted to use grid computing, which harnesses multiple machines to work on a common task, as opposed to using high-powered, stand alone servers. "At the end of the day," Vasconi says, "we needed to build something that will last 30 years."
The hardware building blocks of the Data Factory are Dell servers with Intel processors, running the Linux operating system. The two- and four-processor servers are configured into separate grids that handle different applications. One grid runs the Oracle 10g database; a second runs Jboss' application server, for hosting custom Java code. A third grid runs Tibco Software's Business Works "messaging bus" software, which acts as the communications broker among other pieces of the system. The Tibco software provides the system's SOA backbone. RLP Technologies built the rest of the software it needed. Vasconi estimates that about 50% of the system runs on custom Java code -- less than he originally expected. "The SOA architecture empowered us to go to the marketplace and find companies that embraced the SOA approach and the supporting industry standards," he says. All told, the system now comprises about 50 servers and processes 6 million XML documents per week. The project, from inception to roll-out, took roughly 18 months.
The new system, according to Vasconi, has delivered on Polk's expectations. It is more cost effective to maintain -- close to the company's original goal of cutting maintenance costs by 50% and faster at processing data, although Vasconi couldn't provide specific metrics to back up that claim. The initial acquisition costs for hardware and software were 40% lower than buying a comparable amount of IBM mainframe processing power, although he wouldn't disclose actual costs. Plus, Polk's ongoing maintenance fees -- to vendors including Dell, Tibco, Oracle, Informatica and DataFlux -- will be less than what it has paid to IBM.
An even bigger area of savings for the company: The Data Factory has let the company reduce head count in the data operations group by 43%, from 56 to 32 staff. The reduction in staff was possible because many manual steps in the process have now been automated. Also, with the new system, Polk can catch any data-processing errors earlier in the process, reducing the need to rerun an entire data processing job. In a batch-processing mainframe environment, "you don't have the ability to stop the batch in mid-process and do a quality check," Vasconi explained. "If it was wrong, you would have to run it all over again and find out where in the 50 steps along the way the data anomaly occurred."
Not every company can afford the route taken by R.L. Polk in creating a separate division and dedicating the resources to the project. The gains from an investment depend on finding the right solution to the problem and assimilating this solution to the current business processes. R.L. Polk's case study is an example of a successful implementation of the garbage can model where all variables affecting the project were managed properly. Their strategic vision seems to be paying off. In 2007 RLP Technologies won the Technology Leadership Award for Business Intelligence at the Ventana Resource 2007 Performance Management Leadership Awards [Jaspersoft]. R.L. Polk's new ope source business intelligence infrastructure allows them the ability to add additional sources of data to their database and create new market solutions. Looking forward, Vasconi said data already stored on vehicles' on-board computers -- such as engine-trouble history, GPS-based location history, average speeds and so on -- will soon be imported into the data warehouse, too, if privacy issues can be resolved [Computer World].
You’re 83% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.