arxiv:2301.11689

A rule-free workflow for the automated generation of databases from scientific literature

Published on Feb 6, 2024

Authors:

Abstract

Transformer-based BERT fine-tuning enables automated extraction of structured material properties from scientific literature, demonstrating competitive performance with rule-based methods while offering faster data generation.

AI-generated summary

In recent times, transformer networks have achieved state-of-the-art performance in a wide range of natural language processing tasks. Here we present a workflow based on the fine-tuning of BERT models for different downstream tasks, which results in the automated extraction of structured information from unstructured natural language in scientific literature. Contrary to existing methods for the automated extraction of structured compound-property relations from similar sources, our workflow does not rely on the definition of intricate grammar rules. Hence, it can be adapted to a new task without requiring extensive implementation efforts and knowledge. We test our data-extraction workflow by automatically generating a database for Curie temperatures and one for band gaps. These are then compared with manually-curated datasets and with those obtained with a state-of-the-art rule-based method. Furthermore, in order to showcase the practical utility of the automatically extracted data in a material-design workflow, we employ them to construct machine-learning models to predict Curie temperatures and band gaps. In general we find that, although more noisy, automatically extracted datasets can grow fast in volume and that such volume partially compensates for the inaccuracy in downstream tasks.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2301.11689 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2301.11689 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2301.11689 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.