Research
The impetus at CoreLogic's research team is on developing path-breaking applied technology for real needs. With a constant exposure to live problem scenarios our researchers work on creating the most appropriate industrial solutions. An open environment is provided for the team to explore freely and push the boundaries of technology.
Our technologies are aimed at generating better data quality thus reducing repetitive manual effort, time and subsequently costs leading to greater return on investment for businesses. Additionally, we offer consultancy services for specialized needs.
Our current focus areas are related to complex document-to-data generation involving:
- Image Processing
- Pattern Recognition
- Information Retrieval
- Artificial Intelligence
- Machine Learning
- Natural Language Processing
Our technical expertise spans across all kinds of documents either in scanned images or text formats. Documents can be of differing ages with structured (forms / tabular content) and unstructured (free-flow) content. While capability to handle forms with structured and tabular content exists, our uniqueness is in performing information extraction from unstructured documents with non-uniform free-flow content.
From receiving a document as input till data generation our capabilities include:- Understanding document characteristics
- Analyzing image layout
- Noise removal
- Data Extraction
- Data Formatting and Manipulation
- Information Analysis
Specific components catering to both image and text domains are available and they can be integrated into customized solutions to suit specific business needs.
Components and SolutionsSome of the innovative components and solutions from the CoreLogic Research team include:
Image domain
The team has components to perform:
- Forms Classification
- Special Zones Identification and Extraction such as:
- Seals, Logos, Signatures, Barcodes
- Roads & water-bodies Identification
- Noise Removal (Lines, Smudges, etc.)
- Interpreting Tables
- Highlight / Redact sensitive zones of information
The above components can be used on any of the below listed image sources:
- Scanned Document Images
- Aerial Images
- Satellite Images
- Maps
- Engineering Drawings
Given any text document, we have the capability to extract information with appropriate annotation. Such extracted information is supported with a scientific and statistically relevant confidence measure that aids in business decision.
- Analyze erroneous free flow text
- Online Auto-correction of OCR induced errors
- Entity Recognition and Associations
- Extraction of business specific relevant information
- Document parsing and categorizing of sections
- Name Parsing & Standardization
- Document Type Determination
We are open to taking up specific projects if the problem is aligned with our research capabilities.
Human CapitalThe knowledge potential of this wing includes:
- Post-graduates with expertise in image & text domains
- Ph.Ds with image domain expertise
While the in-house team continues to rapidly enhance its capability, we believe that an alliance between industry-academia will generate technology that fosters mutual growth and positive societal impact. We are interested in exploring various collaborative relationships with leading academic institutions in India.