This dataset comprises 16S rRNA gene V4 region amplicon sequencing data generated from 1,826 human faecal samples, predominantly from the Estonian population. The dataset includes raw sequencing reads and processed amplicon sequence variant (ASV) outputs, consisting of ASV abundance tables, representative ASV sequences, and taxonomic assignments. In addition, a sample metadata table is provided, containing sample and host information, including sample identifier, sequencing run identifier, date of sampling, sex, age, height, weight, nationality, disease status, and Bristol stool type. Sequencing was performed using the Illumina MiSeq and iSeq 100 platforms. The cohort includes individuals with no reported disease (n = 889), gastrointestinal diseases (n = 217), allergies and asthma (n = 194), thyroid gland disorders (n = 117), cardiovascular diseases (n = 103), various general health conditions (n = 139), cancer (n = 28), diabetes (n = 20), gynaecological issues (n = 7). Disease status was unavailable for 112 samples.