• Product
  • Pricing
  • Docs
  • Using PostHog
  • Community
  • Company
  • Login
  • Docs

  • Overview
    • Quickstart with PostHog Cloud
    • Overview
      • AWS
      • Azure
      • DigitalOcean
      • Google Cloud Platform
      • Hobby
      • Other platforms
      • Instance settings
      • Environment variables
      • Securing PostHog
      • Monitoring with Grafana
      • Running behind a proxy
      • Configuring email
      • Helm chart configuration
      • Deploying ClickHouse using Altinity.Cloud
      • Configuring Slack
      • Overview
        • Overview
        • Upgrade notes
        • Overview
        • 0001-events-sample-by
        • 0002_events_sample_by
        • 0003_fill_person_distinct_id2
        • ClickHouse
          • Backup
          • Kafka Engine
          • Resize disk
          • Restore
          • Vertical scaling
          • Horizontal scaling (Sharding & replication)
        • Kafka
          • Resize disk
          • Log retention
        • PostgreSQL
          • Resize disk
          • Troubleshooting long-running migrations
        • Plugin server
        • MinIO
        • Redis
        • Zookeeper
      • Disaster recovery
    • Troubleshooting and FAQs
    • Managing hosting costs
    • EU-only hosting
    • Overview
    • Ingest live data
    • Ingest historical data
    • Identify users
    • User properties
    • Deploying a reverse proxy
    • Libraries
    • Badge
      • Snippet installation
      • Android
      • iOS
      • JavaScript
      • Flutter
      • React Native
      • Browser Extensions
      • Elixir
      • Go
      • Java
      • Node.js
      • PHP
      • Python
      • Ruby
      • Docusaurus v2
      • Gatsby
      • Google Tag Manager
      • Next.js
      • Nuxt.js
      • Retool
      • RudderStack
      • Segment
      • Sentry
      • Slack
      • Shopify
      • WordPress
      • Message formatting
      • Microsoft Teams
      • Slack
      • Discord
    • To another self-hosted instance
    • To PostHog from Amplitude
    • Between Cloud and self-hosted
    • Overview
    • Tutorial
    • Troubleshooting
    • TypeScript types
    • Developer reference
    • Overview
    • POST-only public endpoints
    • Actions
    • Annotations
    • Cohorts
    • Dashboards
    • Event definitions
    • Events
    • Experiments
    • Feature flags
    • Funnels
    • Groups
    • Groups types
    • Insights
    • Invites
    • Members
    • Persons
    • Plugin configs
    • Plugins
    • Projects
    • Property definitions
    • Session recordings
    • Trends
    • Users
    • Data model
    • Overview
    • Data model
    • Ingestion pipeline
    • ClickHouse
    • Querying data
    • Overview
    • GDPR guidance
    • HIPAA guidance
    • CCPA guidance
    • Data egress & compliance
    • Data deletion
    • Overview
    • Code of conduct
    • Recognizing contributions
  • Using PostHog

  • Table of contents
      • Dashboards
      • Funnels
      • Group Analytics
      • Insights
      • Lifecycle
      • Path analysis
      • Retention
      • Stickiness
      • Trends
      • Heatmaps
      • Session Recording
      • Correlation Analysis
      • Experimentation
      • Feature Flags
      • Actions
      • Annotations
      • Cohorts
      • Data Management
      • Events
      • Persons
      • Sessions
      • UTM segmentation
      • Team collaboration
      • Organizations & projects
      • Settings
      • SSO & SAML
      • Toolbar
      • Notifications & alerts
    • Overview
      • Amazon Kinesis Import
      • BitBucket Release Tracker
      • Braze Import
      • Event Replicator
      • GitHub Release Tracker
      • GitHub Star Sync
      • GitLab Release Tracker
      • Heartbeat
      • Ingestion Alert
      • Email Scoring
      • n8n Connector
      • Orbit Connector
      • Redshift Import
      • Segment Connector
      • Shopify Connector
      • Twitter Followers Tracker
      • Zendesk Connector
      • Airbyte Exporter
      • Amazon S3 Export
      • BigQuery Export
      • Customer.io Connector
      • Databricks Export
      • Engage Connector
      • GCP Pub/Sub Connector
      • Google Cloud Storage Export
      • Hubspot Connector
      • Intercom Connector
      • Migrator 3000
      • PagerDuty Connector
      • PostgreSQL Export
      • Redshift Export
      • RudderStack Export
      • Salesforce Connector
      • Sendgrid Connector
      • Sentry Connector
      • Snowflake Export
      • Twilio Connector
      • Variance Connector
      • Zapier Connector
      • Downsampler
      • Event Sequence Timer
      • First Time Event Tracker
      • Property Filter
      • Property Flattener
      • Schema Enforcer
      • Taxonomy Standardizer
      • Unduplicator
      • Automatic Cohort Creator
      • Currency Normalizer
      • GeoIP Enricher
      • Timestamp Parser
      • URL Normalizer
      • User Agent Populator
  • Tutorials
    • All tutorials
    • Actions
    • Apps
    • Cohorts
    • Dashboards
    • Feature flags
    • Funnels
    • Heatmaps
    • Path analysis
    • Retention
    • Session recording
    • Trends
  • Support
  • Docs

  • Overview
    • Quickstart with PostHog Cloud
    • Overview
      • AWS
      • Azure
      • DigitalOcean
      • Google Cloud Platform
      • Hobby
      • Other platforms
      • Instance settings
      • Environment variables
      • Securing PostHog
      • Monitoring with Grafana
      • Running behind a proxy
      • Configuring email
      • Helm chart configuration
      • Deploying ClickHouse using Altinity.Cloud
      • Configuring Slack
      • Overview
        • Overview
        • Upgrade notes
        • Overview
        • 0001-events-sample-by
        • 0002_events_sample_by
        • 0003_fill_person_distinct_id2
        • ClickHouse
          • Backup
          • Kafka Engine
          • Resize disk
          • Restore
          • Vertical scaling
          • Horizontal scaling (Sharding & replication)
        • Kafka
          • Resize disk
          • Log retention
        • PostgreSQL
          • Resize disk
          • Troubleshooting long-running migrations
        • Plugin server
        • MinIO
        • Redis
        • Zookeeper
      • Disaster recovery
    • Troubleshooting and FAQs
    • Managing hosting costs
    • EU-only hosting
    • Overview
    • Ingest live data
    • Ingest historical data
    • Identify users
    • User properties
    • Deploying a reverse proxy
    • Libraries
    • Badge
      • Snippet installation
      • Android
      • iOS
      • JavaScript
      • Flutter
      • React Native
      • Browser Extensions
      • Elixir
      • Go
      • Java
      • Node.js
      • PHP
      • Python
      • Ruby
      • Docusaurus v2
      • Gatsby
      • Google Tag Manager
      • Next.js
      • Nuxt.js
      • Retool
      • RudderStack
      • Segment
      • Sentry
      • Slack
      • Shopify
      • WordPress
      • Message formatting
      • Microsoft Teams
      • Slack
      • Discord
    • To another self-hosted instance
    • To PostHog from Amplitude
    • Between Cloud and self-hosted
    • Overview
    • Tutorial
    • Troubleshooting
    • TypeScript types
    • Developer reference
    • Overview
    • POST-only public endpoints
    • Actions
    • Annotations
    • Cohorts
    • Dashboards
    • Event definitions
    • Events
    • Experiments
    • Feature flags
    • Funnels
    • Groups
    • Groups types
    • Insights
    • Invites
    • Members
    • Persons
    • Plugin configs
    • Plugins
    • Projects
    • Property definitions
    • Session recordings
    • Trends
    • Users
    • Data model
    • Overview
    • Data model
    • Ingestion pipeline
    • ClickHouse
    • Querying data
    • Overview
    • GDPR guidance
    • HIPAA guidance
    • CCPA guidance
    • Data egress & compliance
    • Data deletion
    • Overview
    • Code of conduct
    • Recognizing contributions
  • Using PostHog

  • Table of contents
      • Dashboards
      • Funnels
      • Group Analytics
      • Insights
      • Lifecycle
      • Path analysis
      • Retention
      • Stickiness
      • Trends
      • Heatmaps
      • Session Recording
      • Correlation Analysis
      • Experimentation
      • Feature Flags
      • Actions
      • Annotations
      • Cohorts
      • Data Management
      • Events
      • Persons
      • Sessions
      • UTM segmentation
      • Team collaboration
      • Organizations & projects
      • Settings
      • SSO & SAML
      • Toolbar
      • Notifications & alerts
    • Overview
      • Amazon Kinesis Import
      • BitBucket Release Tracker
      • Braze Import
      • Event Replicator
      • GitHub Release Tracker
      • GitHub Star Sync
      • GitLab Release Tracker
      • Heartbeat
      • Ingestion Alert
      • Email Scoring
      • n8n Connector
      • Orbit Connector
      • Redshift Import
      • Segment Connector
      • Shopify Connector
      • Twitter Followers Tracker
      • Zendesk Connector
      • Airbyte Exporter
      • Amazon S3 Export
      • BigQuery Export
      • Customer.io Connector
      • Databricks Export
      • Engage Connector
      • GCP Pub/Sub Connector
      • Google Cloud Storage Export
      • Hubspot Connector
      • Intercom Connector
      • Migrator 3000
      • PagerDuty Connector
      • PostgreSQL Export
      • Redshift Export
      • RudderStack Export
      • Salesforce Connector
      • Sendgrid Connector
      • Sentry Connector
      • Snowflake Export
      • Twilio Connector
      • Variance Connector
      • Zapier Connector
      • Downsampler
      • Event Sequence Timer
      • First Time Event Tracker
      • Property Filter
      • Property Flattener
      • Schema Enforcer
      • Taxonomy Standardizer
      • Unduplicator
      • Automatic Cohort Creator
      • Currency Normalizer
      • GeoIP Enricher
      • Timestamp Parser
      • URL Normalizer
      • User Agent Populator
  • Tutorials
    • All tutorials
    • Actions
    • Apps
    • Cohorts
    • Dashboards
    • Feature flags
    • Funnels
    • Heatmaps
    • Path analysis
    • Retention
    • Session recording
    • Trends
  • Support

Deleting bulk data

Important: Bulk data deletion is done at your own risk. We do not yet provide a standard way to delete data in bulk, so you must be extremely careful when performing such an operation. We do not take responsibility for any loss of data as a result of this process.

To delete bulk data from your database (e.g. all events), the current suggested way is to directly interact with the PostgreSQL or ClickHouse instance and use SQL queries to delete the desired data.

  • PostgreSQL Docs
  • ClickHouse Docs

Alternatively, one of our contributors has suggested the following Python script for deleting data:

#!/usr/bin/python
import logging
import os
from datetime import datetime, timedelta
import django
django.setup()
from posthog.models import Event, ElementGroup
from django.utils import timezone
max_age_days = int(os.getenv("POSTHOG_CLEANUP_OLDER_THAN_DAYS", 30))
step_size = int(os.getenv("POSTHOG_CLEANUP_BATCH_SIZE", 1000))
dry_run = True if os.getenv("POSTHOG_CLEANUP_DRY_RUN", "False").lower() in ["true", "yes", "1"] else False
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %H:%M:%S ')
def get_events_older_than(older_than):
return Event.objects.filter(timestamp__lt=timezone.make_aware(datetime.now() - timedelta(older_than))).values_list(
'pk', flat=True)
def get_non_referenced_event_groups():
event_group_hashes = Event.objects.all().values_list('elements_hash', flat=True)
return ElementGroup.objects.exclude(hash__in=list(event_group_hashes))
def delete_items(item_type, items):
if dry_run:
logging.info("Skipping delete of items in dry run mode...")
return
item_type.objects.filter(id__in=list(items)).delete()
def delete_items_batched(item_type, items, logging_indent=6 * " "):
number_of_items = len(items)
logging.info("%sDeleting %d items of type %s using batches of %d size:", logging_indent, number_of_items,
item_type.__name__, step_size)
last_id = 0
while last_id + step_size <= number_of_items:
delete_items(item_type, items[last_id:last_id + step_size])
logging.info("%s %d%%", logging_indent, int(last_id / number_of_items * 100))
last_id += step_size
delete_items(item_type, items[last_id:])
logging.info("%s 100%%", logging_indent)
if __name__ == "__main__":
logging.info("Running cleanup of PostHog...")
start_time = datetime.now()
logging.info(" - Deleting all events older than %d days:", max_age_days)
delete_items_batched(Event, get_events_older_than(max_age_days))
logging.info(" - Deleting all elements and element groups not referenced by any event anymore:")
delete_items_batched(ElementGroup, get_non_referenced_event_groups())
logging.info("Cleanup finished, total duration: %s", datetime.now() - start_time)
Want more tips? Get them delivered twice a month.

Questions?

Was this page useful?

Contributor

  • Yakko Majuri
    Yakko Majuri

Share

1,571 views

Filed under...

  • configuration
  • Toggle content width
  • Toggle dark mode
  • About
  • Blog
  • Newsletter
  • Careers
  • Support
  • Contact sales

Product OS suite

Product overview

Analytics
  • Funnels
  • Trends
  • Paths

Pricing

Features
  • Session recording
  • Feature flags
  • Experimentation
  • Heatmaps

Customers

Platform
  • Correlation analysis
  • Collaboration
  • Apps

Community

Discussion
  • Questions?
  • Slack
  • Issues
  • Contact sales
Get involved
  • Roadmap
  • Contributors
  • Merch
  • PostHog FM
  • Marketplace

Docs

Getting started
  • PostHog Cloud
  • Self-hosted
  • Compare options
  • Tutorials
  • PostHog on GitHub
Install & integrate
  • Installation
  • Docs
  • API
  • Apps
User guides
  • Cohorts
  • Funnels
  • Sessions
  • Data
  • Events

Company

About
  • Our story
  • Team
  • Handbook
  • Investors
  • Careers
Resources
  • FAQ
  • Ask a question
  • Blog
  • Press
  • Merch
  • YouTube
© 2022 PostHog, Inc.
  • Code of conduct
  • Privacy
  • Terms