In the fall of 2004, Kevin Vasconi, chief information officer of R.L. Polk & Co., met with other top executives of the company. It was a state-of -- the company meeting to discuss Polk's strategy to enter the future. The consensus was that its information systems would not be able to support the business into the next decade. "If you have that discussion honestly," Vasconi says, "it will scare the crap out of you."
The Southfield, Michigan-based company's business is automobile data. The company compiles vehicle registration and sales data from 260 sources that include motor vehicle departments in the United States and Canada, insurance companies, automakers and lending institutions. Polk then repackages that data and sells it to dealers, manufacturers and marketing firms -- anyone who wants detailed information about car-buying trends, such as the top-selling SUV for a particular ZIP code. For years, Polk's process of consolidating data ran on IBM mainframes that queued data before they were processed in order to maximize mainframe resources. This data was processed in daily or weekly batches, which slowed down the time that customers could receive it. Polk's entire database already compromised more than 1.5 petabytes or 1.5 quadrillion pieces of data.
Customers had been anxious to get sales data more quickly. Paul C. Taylor, chief economist for the National Automobile Dealers Association says that Polk's vehicle registration data by state is typically available 30 days after car makers release their national sales data. That prevents dealers from one state from immediately comparing trends in their area with nationwide trends and adjusting their inventories accordingly. "Ideally, you would have the state breakdown when you have the national sales figures," Taylor says, "But if Polk could shave off even a week from the cycle, that would be a vast improvement."
Actually, Polk had tried twice before to move off the mainframe, but those projects ended up being scaled back. "It's the mother of all databases for automotive intelligence," says Joe Walker, president of Polk Global Automotive, the division of that sells data to businesses. "It seemed too daunting a task to try to move it."
After Vasconi joined the company, the company executives took a different tack with a project code-named ReFuel. In late 2004, Polk created a new company, called RLP Technologies, to build the next data aggregation system. The subsidiary is 7 miles from the main campus at a building in neighboring Farmington, Michigan. It has a full-time staff of 30, and at the peak of development employed 130 contractors, including consultants from Capgemini.
Vasconi's first task was to figure out what the new system would look like. Polk had three high-level objectives, referred to in shorthand as "50/50/100"; The new system need to be 50% more efficient; deliver data 50% more quickly; and aim for 100% data accuracy. Dubbed the Data Factory, the new system performs the same three jobs that the IBM mainframe did. Vasconi knew the system should have a service-oriented architecture, or SOA, which allows software components in different systems to communicate in a standard way. That was so he would have the flexibility to add or change pieces without disrupting the whole system. In addition, Vasconi wanted to use grid computing, which harnesses multiple machines to work on a common task, as opposed to using high-powered, stand alone servers. "At the end of the day," Vasconi says, "we needed to build something that will last 30 years."
The hardware building blocks of the Data Factory are Dell servers with Intel processors, running the Linux operating system. The two- and four-processor servers are configured into separate grids that handle different applications. One grid runs the Oracle 10g database; a second runs Jboss' application server, for hosting custom Java code. A third grid runs Tibco Software's Business Works "messaging bus" software, which acts as the communications broker among other pieces of the system. The Tibco software provides the system's SOA backbone. RLP Technologies built the rest of the software it needed. Vasconi estimates that about 50% of the system runs on custom Java code -- less than he originally expected. "The SOA architecture empowered us to go to the marketplace and find companies that embraced the SOA approach and the supporting industry standards," he says. All told,…