# Data from Kobel et al. "Protozoal populations drive system-wide variation in the rumen microbiome"

This directory contains files that are required to run the analysis and visualization scripts (https://github.com/TheMEMOLab/supacow-share) for the  manuscript (https://doi.org/10.1101/2024.12.05.626740). The details of the files and subdirectories are as follows:

- **kos_of_interest.csv** List of KEGG orthologs and pathways of interest used in figures related to functional annotations. Based on Supplementary Table 5 from https://doi.org/10.1101/2024.08.15.608071.

- **SupaCowTaxTotal.tsv** Taxonomy details for the metagenome assembled genomes (MAGs), protozoal single-amplified genomes (SAGs) and fungal genomes included in these analyses.

## 16S

- **dnasense_meta.csv** Sequencing provider's version of metadata, required for matching amplicon data sample identifiers to the regular sample naming scheme. NB: the animal/experimental details are not necessarily up to date in this file; the main metadata file under **sample_data** should be used for those. 

- **tax_counts_GTDBref.csv** 16S rRNA gene Amplicon Sequence Variant (ASV) count data (as processed with dada2, details provided in manuscript) and the corresponding taxonomy (with GTDB as reference) of rumen digesta.

## MAPP

- **newcastle_mapp_normalized_53S_longer.rds** Results of Microarray Polymer Profiling (MAPP) analysis of rumen digesta. Three different extractions are included (CDTA, cellulase and NaOH); only the NaOH results are presented in the manuscript.

- **probelibrary_plantprobes_v3.rds** Details for probes used for the MAPP analysis.

## metabolomics

- **10_MSOmics_abundances_rect_post.rds** Processed data (normalized intensities) from untargeted metabolomics of rumen digesta from timepoint 6 (post-slaughter).

- **2023-04-03_mb_liver_t6_pqn_areas.csv** Simple table of processed data (PQN-normalized intensities) from untargeted metabolomics of liver samples from timepoint 6 (post-slaughter).

- **2023-04-03_mb_wall_t6_pqn_areas.csv** Simple table of processed data (normalized intensities) from untargeted metabolomics of rumen wall samples from timepoint 6 (post-slaughter).

- **2023-04-05_mb_meta.csv** Metabolomics provider's version of metadata, required for matching metabolomics sample labels to the regular sample naming scheme.

- **RQ00613-NMBU-RP-Liver-results.xlsx** Full results for targeted metabolomics of liver samples from timepoint 6 (same as in **2023-04-03_mb_liver_t6_pqn_areas.csv**), including compound annotations.

- **RQ00613-NMBU-RP-RumenWall-results.xlsx** Full results for targeted metabolomics of rumen wall samples from timepoint 6 (same as in **2023-04-03_mb_wall_t6_pqn_areas.csv**), including compound annotations.

- **VFA_a129-2022_molperc.csv** Targeted metabolomic measurement results for Volatile Fatty Acids (VFAs) from rumen contents (values as molar percent).

## metagenomics

- **Abundance.coverM.tsv** Results from coverM for the abundances of MAGs in samples.

- **hnd_annotations_fix.csv** Table with specific corrections/additions for select functional annotations, based on manual inspection of MAG sequences.

- **metabolism_summary.xlsx** MAG functional annotation results from DRAM.

- **MAGs** Directory with fasta files of the 700 MAGs used as the database for bacteria and archaea in this study, with subdirectories for Genes (predicted genes), Genomes (genomes), and Proteins (amino acid translations of predicted genes). Details of how the MAGs were constructed and dereplicated are provided in the manuscript.

## proteomics

- **imputation_long_v2.rds** Long-format tibble of proteomic/metaproteomic data with imputed missing values included. The source material (digesta (tube or post-slaughter sampling), wall, liver) and imputation group (one breed or both breeds together) is defined by the "group_index" column, with specifics given in the file **keys_presentable_12.rds**

- **keys_presentable_12.rds** Explanations for the "group_index" column in **imputation_long_v2.rds**.

- **proteomics_rawintensity_annotation_taxonomy.tsv** Proteomic/metaproteomic data without missing value imputation. Sample material is defined in column "source" (D = digesta, L = liver, W = wall).

- **holodoublevu/** Directory with results of weighted correlation network analysis (WGCNA) of proteomic/metaproteomic data, as described in the https://github.com/cmkobel/holodoublevu/ repository. Includes separate readme file with detailed explanations of the results.

## sample_data

- **metadata_v1.7.tsv** Main sample metadata for the animals included in the study (for details, see manuscript).

- **sruc_sample_data_t6.csv** Additional animal metadata from SRUC who performed the trial, used to test for potential batch effects.

## transcriptomics

- **2023-7-12_CP958d_meta.csv** Sequencing provider's metadata for liver and wall transcriptomics, required for matching sequence sample identifiers to the general sample labeling scheme.

- **FeatureCount_CP958d_host.txt** Host transcriptomic count data for liver and rumen wall samples (post-slaughter timepoint).

- **FeatureCount_CP958d_microbiome_noHeader.txt** Metatranscriptomic count data from rumen wall (post-slaughter timepoint).

- **Final.Kallisto.Stats.total.tsv** Mapping statistics from Kallisto for digesta metatranscriptomic data.

- **MetaT.RawCounts.Annotated.zerotrim.csv** Metatranscriptomic count data from rumen digesta (all timepoints), with full taxonomic and functional annotation details.