{"id":41489,"date":"2022-11-30T15:25:57","date_gmt":"2022-11-30T15:25:57","guid":{"rendered":"https:\/\/www.rightsdirect.com\/?post_type=blog_post&p=41489"},"modified":"2023-02-16T13:49:35","modified_gmt":"2023-02-16T13:49:35","slug":"3-tipps-zur-datenpipeline","status":"publish","type":"blog_post","link":"https:\/\/www.rightsdirect.com\/de\/blog\/3-tipps-zur-datenpipeline\/","title":{"rendered":"3 Tipps zum Einbinden von Volltextartikeln in Ihre Datenpipeline"},"content":{"rendered":"\n
<\/p>\n\n\n\n
<\/p>\n\n\n\n
After ingesting these feeds, use of the data by particular project teams, internal data lakes, and applications can range from early phase R&D to competitive intelligence, M&A, and licensing, to post-market surveillance and pharmacovigilance.<\/p>\n\n\n\n
But often, these groups will face a variety of challenges and opportunities to get the most return for the organization\u2019s investment. Here are a few examples of challenges you face when normalizing full-text XML data, with tips to overcome them: <\/p>\n\n\n\n
A data provider may deliver materials via SFTP, API, AWS S3 bucket, or some other option, requiring proper scheduling of data transfer jobs. Ideally, these necessitate minimal manual intervention, whilst permitting oversight and awareness of data feeds that have missed scheduled deliveries or produced anomalies (such as an unusual volume of data). The farther upstream such anomalies can be noticed and acted on, the better.\u202f<\/p>\n\n\n\n
Tip: Take a baseline of your data feeds and then regularly calculate variance against this baseline in order to detect potential underlying changes. These changes may turn out to be perfectly explainable \u2013 such as a change in the ownership of a journal that results in its disappearance from a longstanding delivery; but in other cases, this comparison may turn up discrepancies that are true mistakes, giving you the opportunity to rectify.<\/p>\n\n\n\n
Across data providers, and even within a single provider of data, there can and will be format variations. Changes over time, and from imprint to imprint in the case of published journals that may have changed ownership, require attention at the parsing stage. In an example project from experience at CCC, ingesting full-text data across more than 50 STM publishers resulted in having to account for not only variations of more than 10 stated XML formats (including NLM, JATS, and proprietary), but also to address varying levels of adherence to these stated formats.<\/p>\n\n\n\n
Tip: Discuss the potential for variation with your data provider(s) in advance, probing on differences across time and across lines of the provider\u2019s business.\u202f <\/p>\n\n\n\n
Someone, somewhere downstream in the data pipeline, will do something with these data. What is their intended experience to interact with the data, and does this require further processing of the data to satisfy the need? <\/p>\n\n\n\n
Examples to consider: <\/p>\n\n\n\n
Tip: Establish a clear set of requirements for data parsing, linking those to the business benefits your stakeholders expect downstream. With this set of requirements, you can then also prioritize work and set phases of scope. For example: tabular data and figures might be difficult to extract in a first phase, and could be set aside as you refine your approach. <\/p>\n\n\n\n
Insights that can be found only in the full text of scientific articles undoubtedly enrich AI, machine learning, and data visualization projects.\u00a0With RightFind XML<\/a>, organizations can choose from a variety of flexible models to access normalized, full-text scientific literature in XML format, licensed for commercial text and data mining.\u00a0\u00a0<\/p>\n\n\n\n This post is by Michael Iarrobino, Director of Product Management and originally appeared on CCC’s Velocity of Content blog.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":" F&E-Organisationen t\u00e4tigen heute erhebliche Investitionen in wissenschaftliche Literatur. Anspruchsvolle Wissensmanagement-Teams erkennen die Bedeutung von Forschungsdaten im gesamten Unternehmen und erg\u00e4nzen …<\/p>\n","protected":false},"author":242,"featured_media":41490,"template":"","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":"","_links_to":"","_links_to_target":""},"internal_tag":[],"topic":[],"coauthors":[],"class_list":["post-41489","blog_post","type-blog_post","status-publish","has-post-thumbnail","hentry"],"yoast_head":"\n