Skip to content

PSGalleryExplorer - Data Component

Overview

One key feature of PSGalleryExplorer is the inclusion of associated repository information for each module.

For several years PSGalleryExplorer leveraged a PowerShell serverless model for data collection. However, as the number of modules continued to grow, this was unable to process the data set in a time efficient manner.

Today, a fully PowerShell hybrid solution is deployed to continually collect and update repository information for PSGalleryExplorer's use.

Deployment Stack

Design Diagram

PSGalleryExplorer Hybrid SSM repository data scrape

Outline

  1. A Hybrid worker configured via AWS Systems Manager is configured with a weekly scheduled task.
    • This task:
      • Downloads all current modules from the PSGallery
      • Identifies modules that have public repositories
      • Queries GitHub, GitLab, and Bitbucket to retrieve project information
      • Combines data sets together to one final data set.
  2. An AWS Systems Manager Maintenance Window task is set up to retrieve the data set and publish it to Amazon CloudFront
  3. Users of PSGalleryExplorer can quickly download the refreshed data set worldwide when running searches

You can see additional metric data of data cache age, and cache downloads on the PSGalleryExplorer Metrics page.