I'm Adam Lewis


👋 Hi, I'm a PhD Data Scientist currently at Quansight where I perform a wide range of data science tasks to meet our clients' needs. I've created algorithms enabling targeted advertising based on 50+ TB of geospatial data. I've created training material demonstrating Dask, DaskML, and distributed hyperparameter tuning on out of memory datasets. I've also created an optical character recognition pipeline reducing manual entry of business data by 70%.

🎓 Before I joined Quansight, I got my PhD at The University of Texas at Austin where I analyzed terabytes of multimodal imaging data taken from an industrial 3D printing process using Python and Java to detect in-situ defects via aggregation, registration, and visualization.

🌟 Outside of data science I enjoy disc golf 🥏, hiking 🥾, tennis 🎾, traveling ✈️, and I'm always up for a good board game. Check out my social media for more info!

📝 Recent Posts

My Take on DeepLearning.AI's Post-Training of LLMs Course

July 29, 2025

So I just finished DeepLearning.AI’s Post-Training of LLMs course, and honestly? It was pretty much exactly what I needed—a straightforward intro to how you actually fine-tune these big language models after they’ve done their initial training.

What the Course Covers

They break it down into three main ways to do this stuff:

Supervised Fine-Tuning (SFT) is basically when you want to make big changes to how your model behaves. Want to turn a regular foundation model into something that actually follows instructions? Or maybe teach it to use tools? That’s SFT territory. The big takeaway here is that quality beats quantity every time—1,000 really good, diverse examples will crush a million mediocre ones.

Direct Preference Optimization (DPO) is kind of like showing the model examples of “do this, not that.” You give it both good and bad responses so it learns what you actually want. This works great for smaller adjustments like making it safer, better at multilingual stuff, or just following instructions better. Pro tip: start with a model that can already answer questions, then use DPO to polish it up.

Online Reinforcement Learning is where things get really interesting (and complicated). The model generates responses in real-time, gets scored by humans or other models, and then updates itself based on that feedback. Think about how ChatGPT was trained with PPO, or what DeepSeek does with GRPO.

What I Actually Liked About It

The best part? They actually tell you when to use each method instead of just throwing theory at you. You get real advice on how to curate your data, what mistakes to avoid (like when DPO gets obsessed with surface-level patterns), and how much memory each approach is going to eat up.

Plus, they handle all the setup through their Jupyter notebook thing, which is honestly a relief when you just want to learn the concepts without spending half your time fighting with dependencies.

The Not-So-Great Parts

Okay, real talk—some of the hands-on stuff felt a bit like when your older sibling lets you “play” video games but gives you the controller that’s not actually plugged in. 😄 You’re going through the motions, but you’re not really in control. Still, it gives you a decent foundation if you want to actually implement this stuff yourself later.

Also, this definitely isn’t for people who are new to LLMs. You should already get the basics of how language models work before jumping into the fine-tuning world.

For me, this course was pretty much perfect for what I needed—an intro to post-training methods without having to slog through dense academic papers. It’s short, well-organized, and gives you enough understanding to figure out which rabbit holes are actually worth exploring.

Resources


Building Polished Bespoke Solutions Fast with Vibe Coding

July 05, 2025

Building Polished Bespoke Solutions Fast with Vibe Coding

Sometimes the best solutions come from the most personal problems. My wife, an amateur photographer, had a classic modern problem: duplicate photos scattered across multiple Google Takeout extractions from different email accounts. She needed help organizing thousands of photos and removing duplicates without losing precious memories or paying for unnecessary cloud storage.

This is exactly the kind of problem where AI-assisted coding shines. With a newborn at home and precious little free time, I spent about two hours building a proper Python package that not only solved the immediate problem but created something maintainable and extensible. I could have probably written a quick and dirty script in similar time, but not with the amount of polish I had time for with vibe coding.

This has unit tests, a README, allowing me to come back to this in the future if needed. I’d create things like this in the past, but after a certain amount of time had gone by without using it, it would be easier to start over than pick up on what I had done before.

What It Does

The organize-photos tool is relatively simple. It tackles two main challenges:

  1. Smart Organization: Automatically sorts JPEG images into a clean YYYY/MM/DD folder structure using EXIF metadata
  2. Duplicate Detection: Uses SHA256 hashing to identify identical files and generates a CSV report for review before deletion

The tool handles edge cases gracefully - logging errors without crashing, managing filename conflicts, and giving you control over whether to copy or move files. You can see the code here if you’d like - https://github.com/Adam-D-Lewis/organize-photos.

Why I Love Vibe Coding

Vibe coding - that flow state where AI helps you rapidly prototype and refine solutions - allowed me to create something much better than a throwaway script, even with the time constraints of new parenthood. The key benefit isn’t speed (I could have hacked something together just as fast), but that this maximizes the value of those precious few hours of coding time. This approach gave me:

  • Proper structure: A real Python package with pyproject.toml, proper imports, and CLI interface
  • Quality foundations: Tests, error handling, and clean separation of concerns
  • Future-proof: Dependencies properly captured, code that’s readable and extensible
  • Confidence: I can modify this later without fear of breaking everything

The Result

Two hours of focused development produced a tool that’s both immediately useful and built to last. My wife got her photos organized and duplicates identified safely. More importantly, I have a solid foundation that I could expand on in the future - maybe adding support for other image formats, more sophisticated duplicate detection, or integration with cloud storage.

The real win isn’t just solving today’s problem quickly - it’s building solutions that respect your future self.


ContainDS Dashboards Integration

February 15, 2021

ContainDS Dashboards Integration

The image above is of the Dash Web Trader dashboard. Dashboards like the one above can now be easily shared in QHub OnPrem via Contain DS Dashboards.

Recently, I was able to integrate fantastic open source projects: Quansight’s QHub and ContainDS Dashboards.

What problem does QHub OnPrem solve?

QHub OnPrem allows teams to efficiently collaborate by making it easy to share files, environments, scalable infrastructure, and now dashboards in a seamless and secure manner. Under the hood, Qhub OnPrem is an opinionated open source deployment of JupyterHub with some very useful complementary tools included. Qhub couples environment management, infrastructure monitoring, and shared infrastructure use with ease of deployment. While QHub OnPrem is aimed at on-premise infrastructure, QHub-Cloud is the cloud equivalent, and can be deployed on any of the major cloud providers.

QHub OnPrem Architecture

What problem does ContainDS Dashboards solve?

ContainDS Dashboards is an early-stage, publishing solution for Data Scientists to quickly, easily, and securely share results with decision makers. Like QHub, it’s also an open source project and when we saw ContainDS Dashboards we knew it was a natural fit for QHub. ContainDS currently supports Plotly Dash, Panel, Bokeh, Voila, Streamlit, and R Shiny dashboarding libararies. This allows Data Science Teams to build their apps with whatever tool is most familiar to the Data Scientist or most appropriate for an particular dashboard while maintaining a simple, unified framework for distribution of the various types of dashboards.

QHub OnPrem Architecture

Integration of ContainDS Dashboards into QHub OnPrem

ContainDS looked great, but it only integrated with the most standard JupyterHub deployments such as The Littlest Jupyter or Zero to JupyterHub with Kubernetes which scales out user sessions via local processes or Kubernetes respectively. QHub OnPrem, on the other hand, uses the Slurm Cluster Management software to spin up additional user sessions.

Integrating ContainDS Dashboards into QHub OnPrem involved extending ContainDS Dashboards to support the slurmspawner class in the batchspawner library. Additionally, I took care to ensure that QHub remained easily configurable, so users who didn’t wish to use CDS Dashboards could still use QHub without installing it. With a combination of bash scripting, ansible playbooks, and searching through Jinja Templated HTML, I was able to ensure that CDS Dashboards was fully integrated into QHub OnPrem when users set a simple configuration flag.