Skip to content

Sample code for analyzing VCF files (converted to Parquet) in Azure Databricks and Synapse.

License

Notifications You must be signed in to change notification settings

BlueGranite/azure-synapse-vcf-analysis

Repository files navigation

VCF Analysis in Azure Synapse

Sample code for analyzing VCF files in Azure Synapse (once converted to Parquet using Glow).

Colby T. Ford, Ph.D.

Pipeline

Sample Code

  1. Convert VCF files to Parquet: ConvertVCFsToParquet.md
  2. Create External Table to VCF-based Parquet Files in Azure Synapse: CreateVCFTable.md
  3. Sample SQL Queries: SampleQueries.md

Sample Data

The sample VCF data used in this demo is from the Phase 3 release of the 1000 Genomes Project. This includes ~168GB of data in VCFs, which can be downloaded from their FTP site.

BlueGranite Resources