Research

The impetus at CoreLogic's research team is on developing path-breaking applied technology for real needs. With a constant exposure to live problem scenarios our researchers work on creating the most appropriate industrial solutions. An open environment is provided for the team to explore freely and push the boundaries of technology.

Our technologies are aimed at generating better data quality thus reducing repetitive manual effort, time and subsequently costs leading to greater return on investment for businesses. Additionally, we offer consultancy services for specialized needs.

Our current focus areas are related to complex document-to-data generation involving:

  • Image Processing
  • Pattern Recognition
  • Information Retrieval
  • Artificial Intelligence
  • Machine Learning
  • Natural Language Processing
Technological Capabilities

Our technical expertise spans across all kinds of documents either in scanned images or text formats. Documents can be of differing ages with structured (forms / tabular content) and unstructured (free-flow) content. While capability to handle forms with structured and tabular content exists, our uniqueness is in performing information extraction from unstructured documents with non-uniform free-flow content.

From receiving a document as input till data generation our capabilities include:
  • Understanding document characteristics
  • Analyzing image layout
  • Noise removal
  • Data Extraction
  • Data Formatting and Manipulation
  • Information Analysis

Specific components catering to both image and text domains are available and they can be integrated into customized solutions to suit specific business needs.

Components and Solutions
Some of the innovative components and solutions from the CoreLogic Research team include:

Image domain
The team has components to perform:
  • Forms Classification
  • Special Zones Identification and Extraction such as:
    • Seals, Logos, Signatures, Barcodes
    • Roads & water-bodies Identification
  • Noise Removal (Lines, Smudges, etc.)
  • Interpreting Tables
  • Highlight / Redact sensitive zones of information

The above components can be used on any of the below listed image sources:

  • Scanned Document Images
  • Aerial Images
  • Satellite Images
  • Maps
  • Engineering Drawings
Text domain

Given any text document, we have the capability to extract information with appropriate annotation. Such extracted information is supported with a scientific and statistically relevant confidence measure that aids in business decision.

  • Analyze erroneous free flow text
  • Online Auto-correction of OCR induced errors
  • Entity Recognition and Associations
  • Extraction of business specific relevant information
  • Document parsing and categorizing of sections
  • Name Parsing & Standardization
  • Document Type Determination
Custom Research

We are open to taking up specific projects if the problem is aligned with our research capabilities.

Human Capital

The knowledge potential of this wing includes:

  • Post-graduates with expertise in image & text domains
  • Ph.Ds with image domain expertise
University Relations

While the in-house team continues to rapidly enhance its capability, we believe that an alliance between industry-academia will generate technology that fosters mutual growth and positive societal impact. We are interested in exploring various collaborative relationships with leading academic institutions in India.