{"id":25902,"date":"2022-10-17T18:49:57","date_gmt":"2022-10-17T18:49:57","guid":{"rendered":"https:\/\/www.rightsdirect.com\/?post_type=blog_post&p=25902"},"modified":"2023-02-16T13:50:07","modified_gmt":"2023-02-16T13:50:07","slug":"2022-datenverwaltungstrends","status":"publish","type":"blog_post","link":"https:\/\/www.rightsdirect.com\/de\/blog\/2022-datenverwaltungstrends\/","title":{"rendered":"Pflichtlekt\u00fcre zum Thema Datenmanagement im Herbst 2022"},"content":{"rendered":"\n
<\/p>\n\n\n\n
It\u2019s often the case that companies think of the data that they collect or ingest as static. However, data, like a natural resource, has a life cycle of its own. In June an article from the IEEE Computer Society did an excellent job in discussing the importance of managing the entire lifecycle of data. The Importance of Data Lifecycle Management (DLM) and Best Practices<\/a> describes the various stages of the life cycle and highlights the importance of curating and maintaining data.<\/p>\n\n\n\n <\/p>\n\n\n\n Data quality is a well-known challenge for companies of all sizes. Some of the engineers at LinkedIn looked at this problem of managing data quality at the scale of data that LinkedIn consumes. Towards data quality management at LinkedIn<\/a> describes the architecture of a solution that they developed, the \u201cData Health Monitor\u201d, with the goal of improving the quality of data that LinkedIn uses for machine learning efforts.<\/p>\n\n\n\n <\/p>\n\n\n\n Staying on the topic of machine learning, an article the MIT Technology Review<\/em> describes the BLOOM Project (BigScience Large Open-science Open-access Multilingual Language Model), which attempts to eliminate some of the criticism that has been directed at language models: they are opaque, both in the source code and in the data that is used for training the models. Inside a radical new project to democratize AI<\/a> describes how the project designers hope to make their models as powerful as those of proprietary ones, but with a transparent process.<\/p>\n\n\n\n <\/p>\n\n\n\n For those of you interested in fundamental research around databases, an article in the Communications of the ACM<\/em>, The Seattle Report on Database Research<\/a> describes the most recent of an ongoing (since 1988) series of meetings to identify promising areas of research for the next five years.<\/p>\n\n\n\n While the author of 8 Levels of Reproducibility: Future-Proofing Your Python Projects<\/a> uses Python to discuss his ideas, the framework for reproducible research and coding that he lays out is applicable to data science projects using any language.<\/p>\n\n\n\n <\/p>\n\n\n\n \u201eIn fast jeder Position in der heutigen Welt werden Entscheidungen auf der Grundlage der Zusammenstellung und Analyse von Daten und …<\/p>\n","protected":false},"author":242,"featured_media":25903,"template":"","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":"","_links_to":"","_links_to_target":""},"internal_tag":[],"topic":[],"coauthors":[],"class_list":["post-25902","blog_post","type-blog_post","status-publish","has-post-thumbnail","hentry"],"yoast_head":"\nThe data quality challenge in action<\/strong><\/h3>\n\n\n\n
Going inside the BLOOM Project<\/strong><\/h3>\n\n\n\n
Research around databases<\/strong><\/h3>\n\n\n\n
Want to keep learning? Take a look at some of our most recent data-related blog posts:<\/strong><\/h2>\n\n\n\n