The Limitations of Keyword Search

Using Semantic Search to Uncover Scientific Meaning

Keyword Search is the Norm

R&D and information management professionals routinely employ simple keyword searches or more complex Boolean queries when using databases such as PubMed and Ovid and search engines like Google and Google Scholar to find the information they need. While satisfying the basic needs of the researcher, keyword search has limitations which can negatively affect both precision and recall — reducing productivity and hindering researchers’ ability to discover new insights.

 

With Keyword Search, You Get What You Ask For

Keyword searches are quite literal in the sense that computers find terms wherever they appear—even if part of a larger phrase or used in a different context. This approach is effective if the researcher knows exactly what they are looking for. The problem is words often have multiple meanings, so keyword searches often return irrelevant results (false positives), failing to disambiguate unstructured text. For example, a keyword search on the term “AIDS” might include not only references to “Acquired Immune Deficiency Syndrome” but also hearing aids.

Keyword searches may also fail to turn up related materials that don’t specifically use the search term (false negatives). Under these conditions, researchers can miss pertinent information. There is also the danger of making business decisions based on a less than comprehensive set of search results.

Keyword Versus Semantic Search

Red represents keyword search hits for the phrase “adverse events.” Green represents missed hits for this keyword search that a semantic search for the concept adverse event would produce, improving recall (comprehensiveness) without requiring the searcher to explicitly include search terms in their query for every adverse event of interest.

In the example below, green represents a valid hit that a semantic search would provide. Red represents false positives that a keyword search for the concept “AIDS” would provide reducing precision and resulting in more irrelevant hits.

While keyword search may recognize plurals, variations and stemming (connecting a text string to other related text strings, as in fish, fishing, fished), thorough queries must still account for every term and permutation. Researchers often maintain highly-complex queries that
require constant refinement. Compounding the issue, keyword searches get more complicated when users want to go beyond co-occurrence to identify potential causal relationships. Searchers are all too familiar with sifting through lists of document results that contain all required search terms but lack a clear conceptual connection between them. These are
a special kind of false positive, and a case where semantic search has the clear advantage.

Vocabularies that exploit linguistic relationships — such as verbs indicating that a particular biomedical entity is related to another, as when one upregulates, downregulates, inhibits, or disinhibits the other — can be applied in semantic search effectively to further limit results and achieve greater precision. This not only increases accuracy and saves time, it also builds user trust.

 

Semantic Search looks at Meaning

Unlike keyword search, semantic search takes into consideration the researcher’s intent to get at the contextual meaning of terms. Semantic search pushes beyond the boundaries of the organization’s collective base of understanding to get at information and concepts that haven’t been explicitly written into the query. Semantic technology deciphers concepts and meaning by associating search inputs with clarifying terms such as related synonyms that have been built into the system. For example, a search for the common drug brand Lipitor would surface any documents also mentioning atorvastatin; a search for cancer would yield documents discussing types of cancer such as lung, breast, or brain.

This is possible because of the process of semantic enrichment which helps researchers find new ideas and concepts by tagging unstructured text with information about its meaning. To distinguish concepts, semantic technology references vocabularies that contain all known terms for the same thing, relates these entities to each other in hierarchical relationships, and employs algorithms to analyze the context within which those terms appear.

Semantic search greatly improves precision and recall, giving you and your colleagues in R&D the most comprehensive and relevant set of results to help extract new insights, accelerate discovery and guide business decisions.

So how can you apply Semantic Search Across Your Organization?

For R&D-intensive industries such as the life sciences, semantic search can help – delivering value by giving you the ability to turn content into insight.

Download this infographic to discover 5 ways to apply semantic search across your organization to help employees find relevant content.

Download now.