{"id":25701,"date":"2022-08-25T15:55:59","date_gmt":"2022-08-25T15:55:59","guid":{"rendered":"https:\/\/www.rightsdirect.com\/?post_type=blog_post&p=25701"},"modified":"2023-02-16T13:50:11","modified_gmt":"2023-02-16T13:50:11","slug":"knowledge-graph-2","status":"publish","type":"blog_post","link":"https:\/\/www.rightsdirect.com\/de\/blog\/knowledge-graph-2\/","title":{"rendered":"Hinter den Kulissen: So entsteht CCC Expert View"},"content":{"rendered":"\n
<\/p>\n\n\n\n
<\/p>\n\n\n\n
Why did we do that? We believe that knowledge graphs, and their ability to quickly answer questions from large datasets of entities and relationships, are an appropriate tool for finding people and experts in a dataset like the COVID literature.<\/p>\n\n\n\n
The knowledge graph is comprised of two key elements: a data pipeline that produces graph data from source data, and an application that allows the user to explore and interact with that data.<\/p>\n\n\n\n
We start with article metadata and journal data, medical subject headings (MeSH) for our ontology, and institution data from Ringgold. The source data is standard XML and tabular data. As a source of information, it presents many of the challenges that we discuss in more detail here (no clear entities, voluminous, few explicit relationships, of unknown data quality).<\/p>\n\n\n\n
Next,\u202fwe take this data and run it through our data pipeline. This pipeline is a series of processing steps whose purpose is to extract the relevant entities and their relationships in the form of graph data.\u202fThere are five types of entities, namely: authors, articles, institutions, journals, and fields of study. And there are many different types of relationships between them, such as connections between authors and authors, authors and articles, and authors to affiliated institutions.<\/p>\n\n\n\n
These are reference frames that are externally available. We bring in standard identifiers (NLMID, ISSN, MeSH, Ringgold identifiers). We are using known identifiers to build our reference framework. This is the non-article data.<\/p>\n\n\n\n
Next, we bring in our article data and select which content we want to process based on certain customer criteria. This is both selecting the appropriate metadata to use and filtering for the domain of interest.<\/p>\n\n\n\n
Subsequently we create the list of distinct authors. This is the heart of the process where we determine which of the authors represented in article source data are actually distinct individuals and what variations of a name correspond to the same physical person.<\/p>\n\n\n\n
Next, we conduct a statistical analysis both for quality assurance purposes and to calculate our level confidence, or degree of belief.<\/p>\n\n\n\n
The final graph that we produce is a product of a knowledge system; a term used to indicate that there is an iterative nature of refinement built into our processing of the data with the goal of obtaining knowledge. Our learning architecture sets the foundation for improving the quality of the data in the graph over time by quantifying each assertion and providing benchmarks of quality.<\/p>\n\n\n\n
\u201cThe Data Quality Imperative\u201d from CTO Babis Marmanis. He discusses the impact data quality has on knowledge production, with examples from our experiences working with bibliographic raw metadata for the CCC COVID Author Graph.<\/em><\/p>\n\n\n\n Interested in knowing more about how CCC Expert View can help your organization identify experts and key opinion leaders? Learn more.<\/a><\/strong><\/p>\n","protected":false},"excerpt":{"rendered":" Seit Mai 2020 haben wir w\u00f6chentlich \u00fcber 2000 Artikel zu COVID-19 ver\u00f6ffentlicht. Das sind eine Menge neuer Informationen, die es …<\/p>\n","protected":false},"author":242,"featured_media":25702,"template":"","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":"","_links_to":"","_links_to_target":""},"internal_tag":[],"topic":[],"coauthors":[],"class_list":["post-25701","blog_post","type-blog_post","status-publish","has-post-thumbnail","hentry"],"yoast_head":"\n