Solutions by DataGym
DataGym's Showcase
The Technolgy Behind DataGym
DataGYM » Showcase » Case Studies » Data Management
FACT SHEET
Richardson, Texas; May 10, 2001

CIANT Launches the Next Generation in Data Management Tools

Data Management Made Easy with DataGYM

The word management comes from “manage” meaning “to cause to submit to ones control.”

Data Management means active collection, cleanup and organization of data with the objective of being able to understand state of business and leverage it to arrive at what decisions and profitable actions should be taken.

DataGYM is a set of data management tools to effectively manage your or your customer data, and to bring the ultimate revenue-earning power, flexibility and efficiency to your business:

  • Correlate and combine data from multiple sources and formats with ease irrespective of the source
  • Prospect, cross-sell and improve retention activities by understanding customer data across channels
  • Specialized applications in online fraud management, trigger-based cross-selling, data reconciliation
  • Remove, clean and recover “dirty” data to allow your business to leverage information and analytics
  • Handle data effectively and deploy across all LOBs same process irrespective of language and market
  • Enable analytical efforts to become effective by standardizing and validating questionable quality data
  • Use the equation “New data + old information = Triggers for new communication” more effectively
  • Leverage same data tools irrespective of “back-office” or B2B/B2C operations, NT or Unix “boxes.”

DataGYM can help you achieve these with ease by providing you with software tools that are cost-effective and designed with your data in mind. The tool can be used by novice and experts alike and hides complexities of its functionality behind an easy-to-use Graphical Interface with built-in support available off the Internet.

Traditional software and legacy data management tools cannot handle data of any structure or format without prior planning and codification of this understanding in the software. Through DataGYM you can accept data from various sources and in a variety of layouts and get them all quickly into a standard structure ready to glean hidden meaning within the data. As you bring this data together from various formats and sources (such as lines of business or acquired businesses) you can clean up data off special characters, data corruptions, and other erroneous entries and remove data that does not make any sense.

Hidden in your customer’s Name and Address data is a wealth of information waiting to be unlocked. It is possible to identify many things at an individual level such as gender, ethnicity, personal preferences that could be expensive to obtain from data providers. DataGYM gives you the ability to parse data within Name and Address to unlock hidden wealth. It allows you to deal with Address data that is impossible to handle with City, State, Country, etc. floating and all together. This capability enables you to handle international data that has become essential with globalization of products and services. DataGYM allows you to handle data in any language to perform effective cleanup and standardization.

DataGYM can handle Business-to-Business data more effectively than any software available to date. Identify core data from Company Address Lines to implement B2B solutions and your analysis of such data more predictable.

DataGYM was designed to make your analytical efforts more effective, it helps you to standardize and validate data, remove outliers and deal with otherwise invalid data that can invalidate analysis and predictive modeling. DataGYM can help you reject all data that does not meet your exacting data quality requirements and provide you with “set-the-dial”, easy-to-use, and automated ways to recover these data if possible.


Please review the diagram below to see how DataGYM “consolidates” data.

Once data from various sources are organized into one database the complex task of relation building can begin. DataGYM is perhaps the most powerful matching engine available. It can match data, organize the relationships identified and integrate the related data if necessary into actionable data. You will be able to match and organize data from any number of data sources as we have already handled various formats you will be able to match data in multiple formats and still ensure accurate matching of data. DataGYM has a proprietary Phonics based matching algorithm to match data phonetically and overcome phonetic variations in the data. As more and more data is arriving from Telemarketing channels this has become vital to business. What is more, DataGYM can match one set of data to a different set of data to ensure that data has not got switched intentionally or unintentionally.

DataGYM has over 230 probabilistic algorithms that are used to determine if there is match in the data. These probabilistic rules can be changed on the fly as there are other business rules that can change on the fly such as data source priorities, number of match levels, how to deal with missing or ambiguous data.

DataGYM has ways to reduce false-positive matches. It is unique in its ability to identify more Type I errors while at the same time reduce Type II errors. Traditionally the relationship between Type I and Type II errors is linear and as one attempts to identify more Type I error the number of Type II error increases and businesses have to spend more money and analyst time in review of Type II errors in an attempt to root-out the more expensive Type I error.

Most software that identifies relationships in data can do one or two and some even have been able to establish three different relationships from the data. DataGYM can identify an unlimited number of relationships in the data and it can do this in one pass of the data from multiple sources. DataGYM navigates the data like a magnet looking for needle in a haystack. It can “instantly” parse data if necessary and complete a matching process within seconds.

Please review the diagram below to see how DataGYM establishes relationships within data from multiple sources all in one pass, with the capacity to add new relationships incrementally when new data or sources are added.

As relationships are identified among the data, DataGYM is unique in it ability to rationalize the data to provide business with data that would be of the highest quality for each of its customers and with data that is actionable immediately.

DataGYM has middleware components that would integrate a clean and rationalized database with different operational processes to ensure that the database so created is always updated, actionable and accurate. Information from Call Centers and Internet can be instantly parsed and matched with existing data to make number of fraud, credit, risk or response-based decisions.

DataGYM is known for its ability to handle data internationally. It contains over 100 tables defining its runtime operations, business rules and language contexts in the process. This allows DataGYM to handle data in the US, Latin America, Europe, Russia, Asia or Japan with equal ease. The software tools maintain lingual knowledge base that helps it to identify Name and Address components wherever it is and gives it the ability to understand poorly formatted name and address data. DataGYM can deliver data management solutions around the world.

There are two components in the DataGYM Data Management Suite:

Consolidation: This component accepts and standardizes all data sources whether sequential or in another database and parses and integrates data from different sources into one marketing data repository.

Facts:

  • Table based parsing engine to parse all critical data elements.
  • Table can handle English to Japanese Kanji data with equal ease.
  • Input data can be validated to prevent garbage-in/garbage out.
  • Address Correction or Address Standardization may be incorporated.
  • The settings can be used to instantly parse data from transactional processes and the Internet.

Householding : This component matches all available data/record and saves the relationship knowledge in the database. There are a number of matching algorithms based on probabilistic and lingual references that can be changed and customized based on country of application or on Client’s requirements.

Facts:

  1. Phonetic use of language to allow spelling distortions and variations.
  2. Probabilistic algorithms to determine match probability.
  3. First Match/All Match/Best match options.
  4. Unlimited number of match hierarchy and unlimited number of match source tables.
  5. Alias Tables to identify alternate ways the same name or address might be knows.
  6. X-Match allows matching data across field that might have been mis-assigned.
  7. Phonetic Tables can be changed for each market due to language rules.
  8. Match data can be given different weights to emphasize particular words.
  9. Related records can be rationalized to produce a master record based on pre-defined business rules.
  10. The settings can be used to instantly match data from transactional processes and the Internet.

Some examples of how DataGYM Consolidation and DataGYM Householding Functions handle a variety of Data Management and Data Integration issues are provided in the Appendix.


Like the Data Management Suite, the DataGYM software modules can be used for Fraud Management matching purposes “real-time” or can be configured for a variety of web based applications. Examples include Vertical Market Exchanges for credit accreditations and profile preferences; internet interface with wireless and cell phone devices; web site visitor cross-sell; and instant profiling of prospects over web or call center for tailored offerings to maximize Loyalty. In addition, to all traditional data warehousing and data management applications.

For more information, please e-mail info@ciant.com.

 Back to top of page


APPENDIX

Examples of Data Management using DataGYM in the US Market:

Pre-Parsing Examples

Case 1:
DataGYM can handle the Street Address data that is squished together due to limited space of the screen capture program:
Example: 123APT3MAINSTREETFLOOR3RD
After pre-parsing this data becomes:
123 APT 3 MAIN STREET FLOOR 3RD and it becomes easy to handle through Data parsing.
Case 2:
DataGYM can handle Street Address that has floating City, State and Zip Code:
Example: 123 Main Street New York NY 10785
After pre-parsing this data becomes:
123 MAIN STREET in the Address, NEW YORK in the City, NY in the State and 10785 in the Zip Code.

Parsing Example – Address Parsing

Case 3:
DataGYM can handle Street Address that has Street, Post Box and Rural Route defined together:
Example: 123 MOUNTAIN ROAD, BOX 42, RRTE 6
After parsing Street Address, Post Box data and Rural Route data is saved so that other data with same street or Post Box or Rural Route can be matched independently.
Case 4:
DataGYM can handle Street Addresses with no previously known format.
Example: SOUTH 2200 SW 7600
DataGYM can identify the Address Pattern as Street Directional and Number then a Street Directional and another Number. It can be instructed to store the coordinates (deduced from analysis) in the Street Name field and the rest in directionals to make it easy to compare other coordinates.

Parsing Example – Name Parsing

Case 5:
DataGYM can handle a Name field contained many different formats of data:
Example1: JOHN SMITH JR
Example 2: SMITH J JR
DataGYM can be set up so that the first case becomes JOHN, SMITH, JR as the First Name, Last Name and the Last Name Suffix, but in the second case it will parse it as J, SMITH, JR as the First Name, Last Name and the Last Name Suffix.
Case 6:
DataGYM can identify a Job Title in a Name Field and move that out during its Parsing process to take care of the rest of data:
Example: MR. J DOUGLAS SMITH CPA
From this name, Job Title CPA is moved out and then the rest of the data is parsed such that J, DOUGLAS, SMITH become First Name, Middle Name and the Last Name.

Standardization Example

Case 7:
DataGYM can validate and standardize any data coming in:
Example 1: Street Address: Post Office Box 1234
Example 2: Street Address: PO 1234
Example 3: Street Address: BOX 1234
DataGYM can standardize all these variations in the Address to POST BOX 1234.

Rejection Example

Case 8:
DataGYM can reject data so that records with different formats can be recovered in another pass of the data:
Example1: Name: 7 PITT STREET, Address: JOHN CROSSI
DataGYM can be set to reject the record by setting rejection parameter so that NAME Field should not have alpha numeric data.

Relationship Examples

Case 9:
DataGYM can establish the subtlety of Individual names to determine correct relationships:
Example1:
Record 1: JOHN SMITH JR, 123 MAIN STREET
Record 2: J SMITH JR, 123 MAIN STREET
Example 2:
Record1: MRS. J SMITH, 123 MAIN STREET
Record 2: JOHN SMITH, 123 MAIN STREET
DataGYM is able to maintain these records as different individuals in each example above.
Case 10:
DataGYM can leverage Nick Names to determine correct relationships:

Example1:
Record 1: JOHN SMITH, 123 MAIN STREET
Record 2: JACK SMITH, 123 MAIN STREET
Example2:
Record 1: J ROBERT SMITH, 123 MAIN STREET
Record 2: BOB SMITH, 123 MAIN STREET
DataGYM is able to identify in both these examples that these records are related.
Case 11:
DataGYM can compare other data fields to determine relationships:

Example1:
Record 1: JOHN SMITH, 123 MAIN STREET, (212) 555-1212
Record 2: MARY JOHNSON, 123 MAIN STREET (212) 555-1212

Example2:
Record 1: MAY LEI CHUNG, 123 MAIN STREET, 123-23-4567
Record 2: MAY SMITH, 123 MAIN STREET, 123-23-4567

DataGYM is able to identify from the Phone Number in example 1 and the SSN# in example 2 that these records are related.
Case 12:
DataGYM can cross compare data to determine relationships:

Example1:
Record 1: JOHN ROBERT SMITH, 123 MAIN STREET
Record 2: ROBERT J SMITH, 123 MAIN STREET

Example2:
Record 1: JOHN ROBERT SMITH, 123 MAIN STREET
Record 2: ROBERT SMITH JOHANSON, 123 MAIN STREET

DataGYM is able to identify from cross matching Last name to First or Middle Names or First Name to Middle that these records are related.

Rationalization Example

Case 13:
DataGYM can take data matching at different levels and organize the matching records to produce a rationalized output:
Record 1: J D POWERS, 123 MAIN STREET, (212) 555-1212
Record 2: MR. J DOUGLAS POWERS, 123 MAIN STREET FL 6
Record 3: JOHN POWERS, 123 MAIN STREET APT 2

Rationalized Record: MR JOHN DOUGLAS POWERS (M), 123 MAIN STREET, FL6, APT 2, (212) 555-1212.

Examples of interactive learning process to handle Address and Name data in Portuguese:

Address Handing

Case 1:
The Street Address begins with the street type, followed by street name, street number, apartment
Example: R MANTA 29
From this Address we get R is the street type, MANTA is the street name and 29 is the street number. The apartment is missing.
Case 2:
The Street Address has an alpha character not anticipated before
Example: R MANTA N 29
From this Address we get R is the street type, MANTA is the street name and 29 is the street number. We saw a special usage of the initial 'N' that stands for 'Numero'...sometimes 'N' preceeds the street number. This understanding is codified in DataGYM through parameters.
Case 3:
The Street Address has multiple alpha characters and numeric data not anticipated before
Example: R MANTA N 29 L 2
From this address we get R is the street type, MANTA is the street name, 29 is the street number. We find ‘L’ to stand for Loja and so the apartment number is then identified as 2.
Case 4:
Street Address not in any known format.
Example: Q2 LOJA 12
From this Address (we found that street addresses in Brasilia are mentioned using quadrants) we get Q2 is identified as the street name (or it is a significant component of the street) and 12 is the apartment number identified by an apartment type.

Name Handling:

Case 1:
The Name field contained many initials
Example: JOSE C B M JORGE
Case 2:
There are more than one strings as First Name or Middle Name
Example: ALIETE MARIA A DE MELO
From this Name we get ALIETE MARIA as the First Name and A as the Middle Name

Example: SIMONE A ZAMORA DE M ARRUDA
From this Name we get SIMONE as the First Name and A ZAMORA as the Middle Name
Case 3:
Last Name Prefix used to identify the Last Name and Suffixes:
Example: SERGIO P MARTELLO DE FILHO
From this Name we get DE as Last Name prefix, MARTELLO as Last Name and FILHO as Last Name Suffix

Example: ANTONIO M Z COSTA FILHO
From this Name we get COSTA as the Last Name and FILHO as the Last Name Suffix
Example: JOSE G DE CASTRO FILHO
From this Name we get DE as Last Name Prefix, CASTRO as the Last Name and FILHO as the Last Name Suffix

 Back to top of page





Careers    |   Privacy Policy    |   FAQs


Copyright © 2010 CIANT® Corporation • All Rights Reserved Worldwide.
All brands, trademarks, cases and articles are the property of their respective owners.
Powered by Plastic Creations.