gabriellarson/naturelm-8x7b-inst-gguf Q4_K_S GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

gabriellarson/naturelm-8x7b-inst-gguf overview

Model description Nature Language Model (NatureLM) is a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including generating and optimizing small molecules, proteins, RNA, and materials using text instructions; cross-domain generation/design such as protein-to-molecule and protein-to-RNA generation; and top performance across different domains. # Model sources

ggufbiologychemistryenarxiv:2502.07527base_model:microsoft/NatureLM-8x7B-Instbase_model:quantized:microsoft/NatureLM-8x7B-Instlicense:mitendpoints_compatibleregion:us

gabriellarson/naturelm-8x7b-inst-gguf visual

Downloads

119

Likes

Pipeline

—

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

25 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
NatureLM-8x7B-Inst-F16.gguf	GGUF	F16	87.09 GB	Download
NatureLM-8x7B-Inst-IQ1_M.gguf	GGUF	IQ1_M	10.13 GB	Download
NatureLM-8x7B-Inst-IQ1_S.gguf	GGUF	IQ1_S	9.17 GB	Download
NatureLM-8x7B-Inst-IQ2_M.gguf	GGUF	IQ2_M	14.46 GB	Download
NatureLM-8x7B-Inst-IQ2_S.gguf	GGUF	IQ2_S	13.18 GB	Download
NatureLM-8x7B-Inst-IQ2_XS.gguf	GGUF	IQ2_XS	12.99 GB	Download
NatureLM-8x7B-Inst-IQ2_XXS.gguf	GGUF	IQ2_XXS	11.72 GB	Download
NatureLM-8x7B-Inst-IQ3_M.gguf	GGUF	IQ3_M	19.99 GB	Download
NatureLM-8x7B-Inst-IQ3_S.gguf	GGUF	IQ3_S	19.06 GB	Download
NatureLM-8x7B-Inst-IQ3_XS.gguf	GGUF	IQ3_XS	18.05 GB	Download
NatureLM-8x7B-Inst-IQ3_XXS.gguf	GGUF	IQ3_XXS	17.02 GB	Download
NatureLM-8x7B-Inst-IQ4_XS.gguf	GGUF	IQ4_XS	23.39 GB	Download
NatureLM-8x7B-Inst-Q2_K.gguf	GGUF	Q2_K	16.15 GB	Download
NatureLM-8x7B-Inst-Q2_K_S.gguf	GGUF	Q2_K_S	14.96 GB	Download
NatureLM-8x7B-Inst-Q3_K_L.gguf	GGUF	Q3_K_L	22.54 GB	Download
NatureLM-8x7B-Inst-Q3_K_M.gguf	GGUF	Q3_K_M	21.03 GB	Download
NatureLM-8x7B-Inst-Q3_K_S.gguf	GGUF	Q3_K_S	19.06 GB	Download
NatureLM-8x7B-Inst-Q4_0.gguf	GGUF	—	24.77 GB	Download
NatureLM-8x7B-Inst-Q4_K_M.gguf	GGUF	Q4_K_M	26.53 GB	Download
NatureLM-8x7B-Inst-Q4_K_S.gguf	GGUF	Q4_K_S	24.94 GB	Download
NatureLM-8x7B-Inst-Q5_0.gguf	GGUF	—	30.16 GB	Download
NatureLM-8x7B-Inst-Q5_K_M.gguf	GGUF	Q5_K_M	30.98 GB	Download
NatureLM-8x7B-Inst-Q5_K_S.gguf	GGUF	Q5_K_S	30.05 GB	Download
NatureLM-8x7B-Inst-Q6_K.gguf	GGUF	Q6_K	35.78 GB	Download
NatureLM-8x7B-Inst-Q8_0.gguf	GGUF	—	46.27 GB	Download

Model Details Live

Model Slug

gabriellarson/naturelm-8x7b-inst-gguf

Author

gabriellarson

Pipeline Task

—

Library

—

Created

2025-06-20

Last Modified

2025-06-20

Gated

Private

HF SHA

8cc38cfc3f9a35b627c179a9ece70b92caf422f1

License

mit

Language

Base Model

microsoft/NatureLM-8x7B-Inst

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "license": "mit",
    "language": [
      "en"
    ],
    "tags": [
      "biology",
      "chemistry"
    ],
    "base_model": [
      "microsoft/NatureLM-8x7B-Inst"
    ],
    "frontmatter": {
      "license": "mit",
      "language": [
        "en"
      ],
      "tags": [
        "biology",
        "chemistry"
      ],
      "base_model": [
        "microsoft/NatureLM-8x7B-Inst"
      ]
    },
    "hero_image_url": "",
    "summary": "## Model description Nature Language Model (NatureLM) is a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including generating and optimizing small molecules, proteins, RNA, and materials using text instructions; cross-domain generation/design such as protein-to-molecule and protein-to-RNA generation; and top performance across different domains. # Model sources",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: mit\nlanguage:\n- en\ntags:\n- biology\n- chemistry\nbase_model:\n- microsoft/NatureLM-8x7B-Inst\n---\n**these quants are currently not working**\n\n\n\n\n# Model details\n## Model description\nNature Language Model (NatureLM) is a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including generating and optimizing small molecules, proteins, RNA, and materials using text instructions; cross-domain generation/design such as protein-to-molecule and protein-to-RNA generation; and top performance across different domains.\n\n- Developed by:  SFM team ∗ Microsoft Research AI for Science\n- Model type: Sequence-based science foundation model\n- Language(s): English\n- License:  MIT License\n- Finetuned from model: one version of the model is finetuned from Mixtral-8x7B-v0.1\n\n\n# Model sources \n## Repository:\nWe provide two repositories for 8x7B models, including both base versions and instruction-finetuned versions.\n\n- https://huggingface.co/microsoft/NatureLM-8x7B\n- https://huggingface.co/microsoft/NatureLM-8x7B-Inst\n  \n\n## Paper:\n[[2502.07527] Nature Language Model: Deciphering the Language of Nature for Scientific Discovery](https://arxiv.org/abs/2502.07527)\n\n# Uses\n## Direct intended uses\nNatureLM is designed to facilitate scientific discovery across multiple domains, including the generation and optimization of small molecules, proteins, and RNA. It offers two unique features: (1) Text-driven capability — users can prompt NatureLM using natural language instructions; and (2) Cross-domain functionality — NatureLM can perform complex cross-domain tasks, such as generating compounds for specific targets or designing protein binders for small molecules.\nDownstream uses:\nScience researchers can finetune NatureLM for their own tasks, especially cross-domain generation tasks.\n\n## Out-of-scope uses\n### Use in Real-World Applications Beyond Proof of Concept\nNatureLM currently not ready to use in clinical applications, without rigorous external validation and additional specialized development. It is being released for research purposes only.\n### Use outside of the science domain\nNatureLM is not a general-purpose language model and is not designed or optimized to perform general tasks like text summarization or Q&A. \n### Use by Non-Experts\nNatureLM outputs scientific entities (e.g., molecules, proteins, materials) and requires expert interpretation, validation, and analysis. It is not intended for use by non-experts or individuals without the necessary domain knowledge to evaluate and verify its outputs. Outputs, such as small molecule inhibitors for target proteins, require rigorous validation to ensure safety and efficacy. Misuse by non-experts may lead to the design of inactive or suboptimal compounds, resulting in wasted resources and potentially delaying critical research or development efforts. \n### CBRN Applications (Chemical, Biological, Radiological, and Nuclear)\nNatureLM is not intended for the design, development, or optimization of agents or materials for harmful purposes, including but not limited to weapons of mass destruction, bioterrorism, or other malicious uses.\n### Unethical or Harmful Applications\nThe use of NatureLM must align with ethical research practices. It is not intended for tasks that could cause harm to individuals, communities, or the environment.\n\n\n\n## Risks and limitations\nNatureLM may not always generate compounds or proteins precisely aligned with user instructions. Users are advised to apply their own adaptive filters before proceeding. Users are responsible for verification of model outputs and decision-making.\nNatureLM was designed and tested using the English language. Performance in other languages may vary and should be assessed by someone who is both an expert in the expected outputs and a native speaker of that language.\nNatureLM inherits any biases, errors, or omissions characteristic of its training data, which may be amplified by any AI-generated interpretations.  For example, inorganic data in our training corpus is relatively limited, comprising only 0.02 billion tokens out of a total of 143 billion tokens. As a result, the model's performance on inorganic-related tasks is constrained. In contrast, protein-related data dominates the corpus, with 65.3 billion tokens, accounting for the majority of the training data.\nThere has not been a systematic effort to ensure that systems using NatureLM are protected from security vulnerabilities such as indirect prompt injection attacks. Any systems using it should take proactive measures to harden their systems as appropriate.  \n\n\n# Training details\n## Training data\nThe pre-training data includes text, small molecules (SMILES notations), proteins (FASTA format), materials (chemical composition and space group number), DNA (FASTA format), and RNA (FASTA format). The dataset contains single-domain sequences and cross-domain sequences.\n\n## Training procedure\nPreprocessing\nThe training procedure involves two stages: Stage 1 focuses on training newly introduced tokens while freezing existing model parameters. Stage 2 involves joint optimization of both new and existing parameters to enhance overall performance.\n\n## Training hyperparameters\n-\tLearning Rate: 2×10<sup>−4</sup>\n-\tBatch Size (Sentences): 8x7B model: 1536\n-\tContext Length (Tokens): 8192\n-   GPU Number (H100): 8x7B model: 256\n\n## Speeds, sizes, times\n\nModel sized listed above;\n\n# Evaluation\n## Testing data, factors, and metrics\nTesting data\nThe testing data includes 22 types of scientific tasks such as molecular generation, protein generation, material generation, RNA generation, and prediction tasks across small molecules, proteins, DNA.\n\n## Factors\n1.\tCross-Domain Adaptability: The ability of NatureLM to perform tasks that span multiple scientific domains (e.g., protein-to-compound generation, RNA design for CRISPR targets, or material design with specific properties).\n2.\tAccuracy of Outputs: For tasks like retrosynthesis, assess the correctness of the outputs compared to ground truth or experimentally validated data.\n3.\tDiversity and Novelty of Outputs: Evaluate whether the generated outputs are novel (e.g., new molecules or materials not present in databases or training data).\n4.\tScalability Across Model Sizes: Assess the performance improvements as the model size increases (1B, 8B, and 46.7B parameters).\n## Metrics\nAccuracy, AUROC, and independently trained AI-based predictors are utilized for various tasks.\nEvaluation results\n\n1.\tWe successfully demonstrated that NatureLM is capable of performing tasks such as target-to-compound, target-to-RNA, and DNA-to-RNA generation.\n2.\tNatureLM achieves state-of-the-art results on retrosynthesis benchmarks and the MatBench benchmark for materials.\n3.\tNatureLM can generate novel proteins, small molecules, and materials.\n\n# Summary\nNature Language Model (NatureLM) is a groundbreaking sequence-based science foundation model designed to unify multiple scientific domains, including small molecules, materials, proteins, DNA and RNA. This innovative model leverages the \"language of nature\" to enable scientific discovery through text-based instructions. NatureLM represents a significant advancement in the field of artificial intelligence, providing researchers with a powerful tool to drive innovation and accelerate scientific breakthroughs. By integrating knowledge across multiple scientific domains, NatureLM paves the way for new discoveries and advancements in various fields of science. We hope to release it to benefit more users and \ncontribute to the development of AI for Science research.\n\n# Model card contact\nThis work was conducted in Microsoft Research AI for Science. We welcome feedback and collaboration from our audience. If you have suggestions, questions, or observe unexpected/offensive behavior in our technology, please contact us at: \n- Yingce Xia, Yingce.Xia@microsoft.com\n- Chen Hu, chehu@microsoft.com\n- Yawen Yang, v-yangyawen@microsoft.com\n \nIf the team receives reports of undesired behavior or identifies issues independently, we will update this repository with appropriate mitigations.",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "biology",
    "chemistry",
    "en",
    "arxiv:2502.07527",
    "base_model:microsoft/NatureLM-8x7B-Inst",
    "base_model:quantized:microsoft/NatureLM-8x7B-Inst",
    "license:mit",
    "endpoints_compatible",
    "region:us"
  ],
  "likes": 3,
  "downloads": 119,
  "gated": false,
  "private": false,
  "last_modified": "2025-06-20T19:24:11.000Z",
  "created_at": "2025-06-20T05:03:07.000Z",
  "pipeline_tag": "",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "6854eb8b5c437a1cfec0710b",
  "id": "gabriellarson/NatureLM-8x7B-Inst-GGUF",
  "modelId": "gabriellarson/NatureLM-8x7B-Inst-GGUF",
  "sha": "8cc38cfc3f9a35b627c179a9ece70b92caf422f1",
  "createdAt": "2025-06-20T05:03:07.000Z",
  "lastModified": "2025-06-20T19:24:11.000Z",
  "author": "gabriellarson",
  "downloads": 119,
  "likes": 3,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 27
}

gabriellarson/naturelm-8x7b-inst-gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard