Why I Needed Help

Ten episodes of building SportsSync with AI tools, and I'd hit a wall I couldn't prompt my way through. The client-side demo works — you can see cycling telemetry overlaid on a YouTube video in real time. But the actual product needs to produce downloadable videos with the overlay baked in. That's server-side video processing, and I have zero backend experience.

I called Tirth Kajer, a full-stack engineer who runs a software development agency in India. We worked together at a previous company as frontend engineers, but he's since moved into full-stack work and now builds MVPs for early-stage startups. If anyone could sketch out a realistic architecture in 30 minutes, it was him.

The Processing Pipeline

The core workflow Tirth outlined is straightforward in concept:

User submits: YouTube video URL, GPX file, sync point, and desired clip range
Server receives: Creates a job record in the database, uploads GPX to S3, pushes a job ID to a message queue
Worker picks up the job: Downloads the YouTube video, converts GPX to time-series data, detects the video's frame rate
FFmpeg processes: Applies telemetry data as an overlay on each frame, re-encoding the entire video segment
Output goes to S3: The rendered video with embedded overlays is stored, temporary files are cleaned up
User downloads: The finished video is available for download or sharing

The key insight: this is an asynchronous operation. Video processing is compute-intensive — a two-minute clip might take several minutes to render. You can't make the user wait. Instead, you submit the job, show a "processing" status, and notify them when it's done.

The Architecture: Start Simple, Scale Later

Tirth's advice was pragmatic: don't build the production architecture on day one.

MVP approach (single EC2 instance):

One server running everything: API, worker, database
Docker Compose to manage services
Jobs tracked in a database table with status flags (pending, processing, complete)
Worker processes jobs sequentially — first in, first out
Files stored on the EC2 disk temporarily, final output on S3

Production approach (when you need to scale):

Application Load Balancer (ALB) distributing requests
Auto Scaling Group (ASG) managing multiple EC2 instances
SQS (Simple Queue Service) for job queuing instead of database polling
S3 for all file storage (input and output)
RDS for the database (instead of running it on the EC2)
Separate worker instances from API instances

Dream architecture (post-product-market-fit):

Kubernetes cluster with containerized workers
Granular scaling of workers independent of API servers
GPU-intensive instances (G4) for FFmpeg acceleration

The progression makes sense: start with everything on one box, split services as you grow, containerize when you need fine-grained scaling.

The FFmpeg Challenge

The most technically complex part is the overlay rendering. FFmpeg can re-encode a video with data overlaid on each frame, but the process is specific:

Detect the video's FPS (typically 30 or 60 for action cameras)
Sample the GPX time-series data at the same rate — if the video is 30fps, you need 30 telemetry values per second
Generate overlay frames — transparent images or video with the gauges rendered at each timestamp
Composite the overlay onto the source video using FFmpeg's filter chain

This is where the CSS gauges from the client-side demo don't directly translate. In the browser, the overlay is HTML/CSS rendered by the browser engine. For server-side rendering, the overlay needs to be generated as image frames or a transparent video that FFmpeg can composite.

The solution Tirth suggested: render the overlay as a separate transparent video, then use FFmpeg to combine them. This keeps the overlay generation separate from the video processing, making it easier to change the gauge design without re-processing the entire video pipeline.

Cost Reality Check

We looked at AWS EC2 pricing during the call. The numbers were sobering:

C5 instances (compute-optimized): the price varies by size, but even a C5.xlarge is measurable per hour
G4 instances (GPU-enabled, for FFmpeg acceleration): significantly more expensive
Storage on S3 is cheap by comparison
The real cost driver is compute time during video re-encoding

For an MVP with a handful of users, a single C5 or even a T3 instance handles the load. The cost becomes a concern at scale — if hundreds of users are rendering videos simultaneously, the compute bill grows fast. AWS Savings Plans (30-40% discount for 1-3 year commitments) help, but only after you've validated the product.

This pricing reality shapes the product model: rendered videos should expire after 30 days (no permanent storage), and free tier limits should cap the number of renders per month.

Authentication Strategy

A practical detail that's easy to overlook: the backend API needs authentication. Since SportsSync already uses Supabase for the waitlist, the same authentication system can secure the API.

The flow: user logs in via Supabase on the frontend, receives a JWT token, includes that token in API requests to the backend. The backend verifies the token with Supabase before processing any request. Same auth system, no additional infrastructure.

Development Plan

Based on the conversation, my development path is:

Local development first. Write the Python scripts for video download (yt-dlp), GPX parsing, and FFmpeg overlay generation. Test everything on my laptop.
Dockerize. Package the API server, worker, and database into a Docker Compose setup. Verify it works as a unit.
Deploy to a single EC2. Push the Docker setup to AWS. Configure S3 for file storage. Test with real YouTube videos and GPX files.
Add the queue. Replace database polling with SQS for job management. This enables scaling workers later.
Split services. Separate the API server from the workers. Add the load balancer.

Steps 1-3 are the MVP. Steps 4-5 come after product-market fit. The architecture supports this progression without requiring a rewrite.

What I Learned About Asking for Help

As a frontend developer for 10+ years, asking a backend engineer to draw me an architecture diagram felt like admitting incompetence. But Tirth's 30 minutes of whiteboarding saved me weeks of trial and error.

The AI tools I've been using are incredible for implementation — give them a clear specification and they produce working code. But they can't replace the judgment of an experienced engineer who knows which AWS services to use, when to scale, and where the cost traps are. Claude can write an FFmpeg command, but it can't tell me whether to use a C5 or G4 instance based on my expected workload.

The lesson: use AI for code, use humans for architecture decisions.

Create cycling shorts with GPS telemetry

Upload your video, sync your GPX data, and generate ready-to-share shorts in minutes.

Try SportsSync — early access