PyPI connector

Set up the PyPI connector in Kaivo: authentication, configuration, the 3 BigQuery tables it syncs, and answers to common questions.

Written By Lauri Raivio

Last updated About 1 hour ago

Kaivo is a fully managed data platform that syncs your PyPI data into a Google BigQuery warehouse and keeps it up to date automatically. There is no pipeline to build and no infrastructure to run, so you can spend your time analysing your data from PyPI instead of moving it.

What is the PyPI connector

Sync your PyPI package data into BigQuery with Kaivo to track releases and download stats for your packages.

CategoryTech
StatusGenerally available
AuthenticationOther
SetupSelf-service

Getting started with the PyPI connector

  1. Sign up for Kaivo and create a workspace.
  2. Connect your PyPI account.
  3. Choose which tables to sync.
  4. Wait for the initial sync to finish.
  5. Query your data in BigQuery or your favourite AI or BI tool.

Authenticating PyPI

Follow the steps below to connect your PyPI account.

Configuring the PyPI connector

When you set up the connector, you provide:

FieldDescription
PyPI Package

Name of the project/package. Can only be in lowercase with hyphen. This is the name used using pip command for installing the package.

Package Version

Version of the project/package. Use it to find a particular release instead of all releases.

Tables and columns synced from PyPI

Kaivo syncs 3 tables from PyPI into a dedicated dataset in your BigQuery warehouse. Click any table to see its columns and types.

How the PyPI sync works

After the first load, Kaivo keeps your BigQuery warehouse up to date for you. Where PyPI supports it, each sync pulls only new and changed records so it stays fast; otherwise it refreshes the whole table. Every record keeps its original ID, so you won't get duplicate rows.

Frequently asked questions

How long does the initial sync take for PyPI?

It depends on how much history is in your PyPI account. Most initial syncs finish within minutes, while large accounts can take a few hours. After that, syncs only fetch new and changed records, so they're much faster.

Can I sync only some tables or columns?

Yes. You pick which tables to sync when you set up the connection and can change the selection later. Tables you don't select are never copied to your warehouse.

What happens when PyPI's schema changes?

New fields are never added automatically. You choose which fields to sync, so data you haven't selected (sensitive personal data, for example) never lands in your warehouse. When a new field appears, it becomes available for you to add. What happens to removed or renamed fields depends on a table's sync mode: full-refresh tables always match what's currently in PyPI, so dropped fields disappear, while incremental tables keep their existing columns and history, so an old field stays and newly added fields fill in over time.

How do I handle GDPR or data deletion requests?

Your data lives in your own Kaivo-managed BigQuery warehouse, so the most direct option is to delete or anonymise specific records right in BigQuery. If you delete data in PyPI instead, full-refresh tables drop it on the next sync, while incremental tables keep it, so you would remove the row in BigQuery or ask us to run a full refresh. To remove everything, delete the PyPI connector in Kaivo and all of its synced data is deleted with it.

Common use cases for PyPI data

Use stats to track package downloads over time.

Release history

Join project with release to document your release history.

Adoption analysis

Bring PyPI data into BigQuery to study how adoption of your packages grows.

Use PyPI data in your AI and BI tools

Once PyPI data lands in your Kaivo-managed BigQuery warehouse, you can explore it with AI tools or any BI tool that connects to BigQuery. Here's how the most common destinations work with PyPI data.

Claude

Use Kaivo's MCP server to give Claude secure, workspace-scoped access to your data. Setup guide β†’

Power BI

Microsoft's BI tool with a native BigQuery connector. Supports direct query and scheduled refresh. Setup guide β†’

Data Studio

Free Google BI tool with native BigQuery support. One-click connection to your Kaivo warehouse; great for SMB teams on Google Workspace. Setup guide β†’

Tableau

The premium analytics standard, with native BigQuery integration. Setup guide β†’

Google Sheets

Use Connected Sheets to query BigQuery directly from a spreadsheet, with no SQL. Setup guide β†’

Excel

Connect via Power Query's BigQuery connector. Setup guide β†’

Metabase

Open-source BI tool with strong BigQuery support. Setup guide β†’

See our pricing page for PyPI connector pricing and plan details.

  • Adform: Sync Adform to BigQuery.
  • Amplitude: Sync Amplitude to BigQuery.
  • Auth0: Sync Auth0 to BigQuery.
  • Convex: Sync Convex to BigQuery.
  • GitHub: Sync GitHub to BigQuery.
  • GitLab: Sync GitLab to BigQuery.