Token Classification
Transformers
PyTorch
English
bert
fill-mask
bert-base-cased
biodiversity
sequence-classification
Instructions to use NoYo25/BiodivBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NoYo25/BiodivBERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="NoYo25/BiodivBERT")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("NoYo25/BiodivBERT") model = AutoModelForMaskedLM.from_pretrained("NoYo25/BiodivBERT") - Notebooks
- Google Colab
- Kaggle
BiodivBERT
Model description
- BiodivBERT is a domain-specific BERT based cased model for the biodiversity literature.
- It uses the tokenizer from BERTT base cased model.
- BiodivBERT is pre-trained on abstracts and full text from biodiversity literature.
- BiodivBERT is fine-tuned on two down stream tasks for Named Entity Recognition and Relation Extraction in the biodiversity domain.
- Please visit our GitHub Repo for more details.
How to use
- You can use BiodivBERT via huggingface library as follows:
- Masked Language Model
>>> from transformers import AutoTokenizer, AutoModelForMaskedLM
>>> tokenizer = AutoTokenizer.from_pretrained("NoYo25/BiodivBERT")
>>> model = AutoModelForMaskedLM.from_pretrained("NoYo25/BiodivBERT")
- Token Classification - Named Entity Recognition
>>> from transformers import AutoTokenizer, AutoModelForTokenClassification
>>> tokenizer = AutoTokenizer.from_pretrained("NoYo25/BiodivBERT")
>>> model = AutoModelForTokenClassification.from_pretrained("NoYo25/BiodivBERT")
- Sequence Classification - Relation Extraction
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
>>> tokenizer = AutoTokenizer.from_pretrained("NoYo25/BiodivBERT")
>>> model = AutoModelForSequenceClassification.from_pretrained("NoYo25/BiodivBERT")
Training data
- BiodivBERT is pre-trained on abstracts and full text from biodiversity domain-related publications.
- We used both Elsevier and Springer APIs to crawl such data.
- We covered publications over the duration of 1990-2020.
Evaluation results
BiodivBERT overperformed both BERT_base_cased, biobert_v1.1, and BiLSTM as a baseline approach on the down stream tasks.
- Downloads last month
- 241