PPTX creation, editing, and analysis
Overview
A user may ask you to create, edit, or analyze the contents of a .pptx file. A .pptx file is essentially a ZIP archive containing XML files and other resources that you can read or edit. You have different tools and workflows available for different tasks.
Reading and analyzing content
Text extraction
If you just need to read the text contents of a presentation, you should convert the document to markdown:
# Convert document to markdown
python -m markitdown path-to-file.pptx
Raw XML access
You need raw XML access for: comments, speaker notes, slide layouts, animations, design elements, and complex formatting. For any of these features, you'll need to unpack a presentation and read its raw XML contents.
Unpacking a file
python ooxml/scripts/unpack.py <office_file> <output_dir>
Note: The unpack.py script is located at skills/pptx/ooxml/scripts/unpack.py relative to the project root. If the script doesn't exist at this path, use find . -name "unpack.py" to locate it.
Key file structures
ppt/presentation.xml - Main presentation metadata and slide references
ppt/slides/slide{N}.xml - Individual slide contents (slide1.xml, slide2.xml, etc.)
ppt/notesSlides/notesSlide{N}.xml - Speaker notes for each slide
ppt/comments/modernComment_*.xml - Comments for specific slides
ppt/slideLayouts/ - Layout templates for slides
ppt/slideMasters/ - Master slide templates
ppt/theme/ - Theme and styling information
ppt/media/ - Images and other media files
Typography and color extraction
When given an example design to emulate: Always analyze the presentation's typography and colors first using the methods below:
- Read theme file: Check
ppt/theme/theme1.xml for colors (<a:clrScheme>) and fonts (<a:fontScheme>)
- Sample slide content: Examine
ppt/slides/slide1.xml for actual font usage (<a:rPr>) and colors
- Search for patterns: Use grep to find color (
<a:solidFill>, <a:srgbClr>) and font references across all XML files
Creating a new PowerPoint presentation without a template
When creating a new PowerPoint presentation from scratch, use the html2pptx workflow to convert HTML slides to PowerPoint with accurate positioning.
Design Principles
CRITICAL: Before creating any presentation, analyze the content and choose appropriate design elements:
- Consider the subject matter: What is this presentation about? What tone, industry, or mood does it suggest?
- Check for branding: If the user mentions a company/organization, consider their brand colors and identity
- Match palette to content: Select colors that reflect the subject
- State your approach: Explain your design choices before writing code
Requirements:
- ✅ State your content-informed design approach BEFORE writing code
- ✅ Use web-safe fonts only: Arial, Helvetica, Times New Roman, Georgia, Courier New, Verdana, Tahoma, Trebuchet MS, Impact
- ✅ Create clear visual hierarchy through size, weight, and color
- ✅ Ensure readability: strong contrast, appropriately sized text, clean alignment
- ✅ Be consistent: repeat patterns, spacing, and visual language across slides
Color Palette Selection
Choosing colors creatively:
- Think beyond defaults: What colors genuinely match this specific topic? Avoid autopilot choices.
- Consider multiple angles: Topic, industry, mood, energy level, target audience, brand identity (if mentioned)
- Be adventurous: Try unexpected combinations - a healthcare presentation doesn't have to be green, finance doesn't have to be navy
- Build your palette: Pick 3-5 colors that work together (dominant colors + supporting tones + accent)
- Ensure contrast: Text must be clearly readable on backgrounds
Example color palettes (use these to spark creativity - choose one, adapt it, or create your own):
- Classic Blue: Deep navy (#1C2833), slate gray (#2E4053), silver (#AAB7B8), off-white (#F4F6F6)
- Teal & Coral: Teal (#5EA8A7), deep teal (#277884), coral (#FE4447), white (#FFFFFF)
- Bold Red: Red (#C0392B), bright red (#E74C3C), orange (#F39C12), yellow (#F1C40F), green (#2ECC71)
- Warm Blush: Mauve (#A49393), blush (#EED6D3), rose (#E8B4B8), cream (#FAF7F2)
- Burgundy Luxury: Burgundy (#5D1D2E), crimson (#951233), rust (#C15937), gold (#997929)
- Deep Purple & Emerald: Purple (#B165FB), dark blue (#181B24), emerald (#40695B), white (#FFFFFF)
- Cream & Forest Green: Cream (#FFE1C7), forest green (#40695B), white (#FCFCFC)
- Pink & Purple: Pink (#F8275B), coral (#FF574A), rose (#FF737D), purple (#3D2F68)
- Lime & Plum: Lime (#C5DE82), plum (#7C3A5F), coral (#FD8C6E), blue-gray (#98ACB5)
- Black & Gold: Gold (#BF9A4A), black (#000000), cream (#F4F6F6)
- Sage & Terracotta: Sage (#87A96B), terracotta (#E07A5F), cream (#F4F1DE), charcoal (#2C2C2C)
- Charcoal & Red: Charcoal (#292929), red (#E33737), light gray (#CCCBCB)
- Vibrant Orange: Orange (#F96D00), light gray (#F2F2F2), charcoal (#222831)
- Forest Green: Black (#191A19), green (#4E9F3D), dark green (#1E5128), white (#FFFFFF)
- Retro Rainbow: Purple (#722880), pink (#D72D51), orange (#EB5C18), amber (#F08800), gold (#DEB600)
- Vintage Earthy: Mustard (#E3B448), sage (#CBD18F), forest green (#3A6B35), cream (#F4F1DE)
- Coastal Rose: Old rose (#AD7670), beaver (#B49886), eggshell (#F3ECDC), ash gray (#BFD5BE)
- Orange & Turquoise: Light orange (#FC993E), grayish turquoise (#667C6F), white (#FCFCFC)
Visual Details Options
Geometric Patterns:
- Diagonal section dividers instead of horizontal
- Asymmetric column widths (30/70, 40/60, 25/75)
- Rotated text headers at 90° or 270°
- Circular/hexagonal frames for images
- Triangular accent shapes in corners
- Overlapping shapes for depth
Border & Frame Treatments:
- Thick single-color borders (10-20pt) on one side only
- Double-line borders with contrasting colors
- Corner brackets instead of full frames
- L-shaped borders (top+left or bottom+right)
- Underline accents beneath headers (3-5pt thick)
Typography Treatments:
- Extreme size contrast (72pt headlines vs 11pt body)
- All-caps headers with wide letter spacing
- Numbered sections in oversized display type
- Monospace (Courier New) for data/stats/technical content
- Condensed fonts (Arial Narrow) for dense information
- Outlined text for emphasis
Chart & Data Styling:
- Monochrome charts with single accent color for key data
- Horizontal bar charts instead of vertical
- Dot plots instead of bar charts
- Minimal gridlines or none at all
- Data labels directly on elements (no legends)
- Oversized numbers for key metrics
Layout Innovations:
- Full-bleed images with text overlays
- Sidebar column (20-30% width) for navigation/context
- Modular grid systems (3×3, 4×4 blocks)
- Z-pattern or F-pattern content flow
- Floating text boxes over colored shapes
- Magazine-style multi-column layouts
Background Treatments:
- Solid color blocks occupying 40-60% of slide
- Gradient fills (vertical or diagonal only)
- Split backgrounds (two colors, diagonal or vertical)
- Edge-to-edge color bands
- Negative space as a design element
Layout Tips
When creating slides with charts or tables:
- Two-column layout (PREFERRED): Use a header spanning the full width, then two columns below - text/bullets in one column and the featured content in the other. This provides better balance and makes charts/tables more readable. Use flexbox with unequal column widths (e.g., 40%/60% split) to optimize space for each content type.
- Full-slide layout: Let the featured content (chart/table) take up the entire slide for maximum impact and readability
- NEVER vertically stack: Do not place charts/tables below text in a single column - this causes poor readability and layout issues
Workflow
- MANDATORY - READ ENTIRE FILE: Read
html2pptx.md completely from start to finish. NEVER set any range limits when reading this file. Read the full file content for detailed syntax, critical formatting rules, and best practices before proceeding with presentation creation.
- Create an HTML file for each slide with proper dimensions (e.g., 720pt × 405pt for 16:9)
- Use
<p>, <h1>-<h6>, <ul>, <ol> for all text content
- Use
class="placeholder" for areas where charts/tables will be added (render with gray background for visibility)
- CRITICAL: Rasterize gradients and icons as PNG images FIRST using Sharp, then reference in HTML
- LAYOUT: For slides with charts/tables/images, use either full-slide layout or two-column layout for better readability
- Create and run a JavaScript file using the
html2pptx.js library to convert HTML slides to PowerPoint and save the presentation
Editing an existing PowerPoint presentation
When edit slides in an existing PowerPoint presentation, you need to work with the raw Office Open XML (OOXML) format. This involves unpacking the .pptx file, editing the XML content, and repacking it.
Workflow
- MANDATORY - READ ENTIRE FILE: Read
ooxml.md (~500 lines) completely from start to finish. NEVER set any range limits when reading this file. Read the full file content for detailed guidance on OOXML structure and editing workflows before any presentation editing.
- Unpack the presentation:
python ooxml/scripts/unpack.py <office_file> <output_dir>
- Edit the XML files (primarily
ppt/slides/slide{N}.xml and related files)
- CRITICAL: Validate immediately after each edit and fix any validation errors before proceeding:
python ooxml/scripts/validate.py <dir> --original <file>
- Pack the final presentation:
python ooxml/scripts/pack.py <input_directory> <office_file>
Creating a new PowerPoint presentation using a template
When you need to create a presentation that follows an existing template's design, you'll need to duplicate and re-arrange template slides before then replacing placeholder context.
Workflow
-
Extract template text AND create visual thumbnail grid:
- Extract text:
python -m markitdown template.pptx > template-content.md
- Read
template-content.md: Read the entire file to understand the contents of the template presentation. NEVER set any range limits when reading this file.
- Create thumbnail grids:
python scripts/thumbnail.py template.pptx
- See Creating Thumbnail Grids section for more details
-
Analyze template and save inventory to a file:
- Visual Analysis: Review thumbnail grid(s) to understand slide layouts, design patterns, and visual structure
- Create and save a template inventory file at
template-inventory.md containing:
# Template Inventory Analysis
**Total Slides: [count]**
**IMPORTANT: Slides are 0-indexed (first slide = 0, last slide = count-1)**
## [Category Name]
- Slide 0: [Layout code if available] - Description/purpose
- Slide 1: [Layout code] - Description/purpose
- Slide 2: [Layout code] - Description/purpose
[... EVERY slide must be listed individually with its index ...]
Creating Thumbnail Grids
To create visual thumbnail grids of PowerPoint slides for quick analysis and reference:
python scripts/thumbnail.py template.pptx [output_prefix]
Features:
- Creates:
thumbnails.jpg (or thumbnails-1.jpg, thumbnails-2.jpg, etc. for large decks)
- Default: 5 columns, max 30 slides per grid (5×6)
- Custom prefix:
python scripts/thumbnail.py template.pptx my-grid
- Note: The output prefix should include the path if you want output in a specific directory (e.g.,
workspace/my-grid)
- Adjust columns:
--cols 4 (range: 3-6, affects slides per grid)
- Grid limits: 3 cols = 12 slides/grid, 4 cols = 20, 5 cols = 30, 6 cols = 42
- Slides are zero-indexed (Slide 0, Slide 1, etc.)
Use cases:
- Template analysis: Quickly understand slide layouts and design patterns
- Content review: Visual overview of entire presentation
- Navigation reference: Find specific slides by their visual appearance
- Quality check: Verify all slides are properly formatted
Examples:
# Basic usage
python scripts/thumbnail.py presentation.pptx
# Combine options: custom name, columns
python scripts/thumbnail.py template.pptx analysis --cols 4
Converting Slides to Images
To visually analyze PowerPoint slides, convert them to images using a two-step process:
-
Convert PPTX to PDF:
soffice --headless --convert-to pdf template.pptx
-
Convert PDF pages to JPEG images:
pdftoppm -jpeg -r 150 template.pdf slide
This creates files like slide-1.jpg, slide-2.jpg, etc.
Options:
-r 150: Sets resolution to 150 DPI (adjust for quality/size balance)
-jpeg: Output JPEG format (use -png for PNG if preferred)
-f N: First page to convert (e.g., -f 2 starts from page 2)
-l N: Last page to convert (e.g., -l 5 stops at page 5)
slide: Prefix for output files
Example for specific range:
pdftoppm -jpeg -r 150 -f 2 -l 5 template.pdf slide # Converts only pages 2-5
Code Style Guidelines
IMPORTANT: When generating code for PPTX operations:
- Write concise code
- Avoid verbose variable names and redundant operations
- Avoid unnecessary print statements
Dependencies
Required dependencies (should already be installed):
- markitdown:
pip install "markitdown[pptx]" (for text extraction from presentations)
- pptxgenjs:
npm install -g pptxgenjs (for creating presentations via html2pptx)
- playwright:
npm install -g playwright (for HTML rendering in html2pptx)
- react-icons:
npm install -g react-icons react react-dom (for icons)
- sharp:
npm install -g sharp (for SVG rasterization and image processing)
- LibreOffice:
sudo apt-get install libreoffice (for PDF conversion)
- Poppler:
sudo apt-get install poppler-utils (for pdftoppm to convert PDF to images)
- defusedxml:
pip install defusedxml (for secure XML parsing)