 |
|
|
 |
 |
|
|
 |
 |
|
Research and Technology
- Natural Language Processing :
Language has been the means of communication for human kind for hundreds of years. This has resulted in generation of large volumes of information in form
of literature, history, art, science and philosophy in hundreds of languages. With the whole world going digital and over the web there is a deepening need for
digital devices such as computers and mobiles to understand human languages. The application of the same may vary from sentiment analysis, language translation,
document summarization, Named Entity such as Person/Place/Company name identification, Key concept generation etc. Sophisticated graphical language models
Conditional Random Fields (CRFs), Maximum Entropy Markov Models (MEMMs) are used for Part-of-Speech (POS) tagging and Noun-Verb phrase identification.
Machine learning, data mining and AI components can be as used to facilitate solving natural Language processing problems.
- Statistical Machine Learning :
At root of all intelligence lays an ability to learn from past experience and use the same to make decisions. In machine learning the fundamental
philosophy is to provide machines with historical data to learn from and train them to make sophisticated decision similar to that of human capability.
Though human brain can make intelligent circumstantial decision, it does fail to scale when number of parameters needed are large in number.
Machine learning algorithms provide ability to build statistical model over sufficiently large data set that can assist human in decision making.
Machine learning can be broadly classified in supervised and unsupervised learning models. There are wide ranges of learning algorithms used in
practice such as Support Vector Machines (SVMs), Bayesian Classifies, decision trees, neural networks etc
- Ontology based Semantic Technology :
Ontology is a representation of knowledge though concepts and relationship shared between the concepts. Ontology endeavors to represent a
system or a domain to its completeness. The structured nature of an ontology allows computers to consume them more effectively. A system which is enlightened with
domain knowledge through ontology can be used to derive complex inferences which are impossible to deliver otherwise. For an instance, different doctors may refer
to a symptom as heart attack, cardiac arrest or myocardial dysfunctional. A solution which is boosted by medical ontology can easily identify that all the three refer to serious
heart diseases. This inferencing capability can be chained in forward or backward directions to derive complex conclusion. There are several ontologies which are being developed
by various research institutes and companies in the areas of medical science, patent classification, life sciences, retails marketing, word-net, concept-net that can be integrated in
solution as per the requirement.
- Scalable Distributed Systems :
A distributed system can be developed using a cluster of nodes distributed over the intra net. To achieve high scalability NoSQL data stores can be developed
using distributed solutions such HBase and HIVE which reside over Hadoop's distributed file system(HDFS), MongoDB, Casandra or similar
map based data store. To harness computational advantage of distributed systems, functional language paradigm of MapReduce can be utilized which breaks
down task into small subtask getting executed in parallel over nodes in the grid/cluster. Another benefit of using Hadoop based solution is it's fail safe nature, which
means there is no on failure of any node in the system over the productivity.
- Information Storage and Retrieval :
Information storage and retrieval is an age old problem. The basic demand is an ability to search through tens of thousands of documents for a phrase or a keyword
very efficiently. Building such a system requires preprocessing of data and converting it to inverted document index. This data structure allows fast search
through a large number of document and retrieving exact document of interest. Another challenge here is the order in which retrieved results are presented back to the user.
For instance use searching for *The world* would intend to give more importance to the word *world* rather then to the word *the*. This intention needs to be captured
statistical by using scoring algorithm and ordering according to their importance. The search experience can be further enhances by pruning the word to the root word and allowing synonym
based searching. Advanced search capability can be achieved by using LSI (Latent Semantic Indexing) which support semantic search capability without using any ontology based meta information.
- Advanced Algorithms :
Irrespective of domain/technology the biggest challenge while solving any problems using computers is to map problems from domains such as finance/life sciences/entertainment
to appropriate data structures such as strings, graphs and trees. A more crucial step then is to discover which of the wide array of advances algorithms can be effectivly applied.
For instance Geometric Invariants can be used for protein structure comparison and gene expression can be studied using String Prefix Trees. Being aware of algorithmic
possibilities is at the core of developing world class solutions.
- Data Mining :
Data mining is a science of designing and developing algorithm for identifying patterns and deriving conclusion based on available data. Data mining algorithms
are prominently classified into two categories: Classification and Clustering. Classification algorithms are decision making models which are built on labeled data
set whereas clustering algorithms as name suggest identifies group of data points which are similar to each other. In both cases fundamental goal is to derive meaning
from large data set.
|
|
|
 |