I'm an engineer interested in equitable applications of machine learning in the real world,
with 10+ years experience with data platforms and ML architecture.
I currently work on analysis and machine learning at
Benchling.Skills
Languages: Python, Go, Clojure, C/C++
Tools: Git, Kubernetes, Docker, Terraform, Google Cloud Platform, AWS, IAM, DuckDB, Paruqet, Iceberg, Airflow
Education
Tufts University · B.S. in Cognitive & Brain Sciences, Computer Science (2011)GPA 3.7/4.0 · Magna Cum Laude
Experience
Benchling · Software Engineer, Analysis & ML Platform ( Nov 2020 – present )- Introduced net-new machine learning orchestration architecture with an org-wide RFC, establishing novel backend concepts while integrating with Benchling’s existing systems. AWS Sagemaker and Step Functions, Parameter Optimization, Pilot model development, multi-tenancy, security
- Integrated Alphafold into Benchling using the newly designed ML architecture, enabling users to run Deepmind’s open source protein structure prediction model on their amino acid sequences
- Designed and built a stateless service running DuckDB on Parquet to back an interactive analysis feature [blog post]
- Introduced a templatization method for customer-authored analysis pipelines, enabling re-use of complex analyses on variable datasets while maintaining traceability
- Designed and built a drop-in replacement for BigQuery during a post-acquisition migration from GCP to AWS, a system that handles complex SQL queries on 100+ million row antibody sequence datasets.
Mixpanel · Machine Learning Tech Lead ( Apr 2015 – Feb 2020 )- Served as Technical Lead of the Machine Learning team, responsible for developing, deploying, maintaining, and optimizing ML models to production; data pipeline design, stability & reliability; code quality and refactoring; managing on-call rotations; onboarding and mentoring junior engineers. Worked closely with infrastructure and SRE teams to ensure reliability of backend services. Worked closely with product managers and designers to address user needs, providing technical context when necessary. Major projects include:
- Predict: group users by their likelihood to perform a behavior in the future (logistic regression, limited-memory BFGS optimization, distributed stratified sampling method)
- Alerts: anomaly detection of time series data (Holt-Winters ETS, Robust Anomaly Detection (RAD), Robust Principal Component Analysis (RPCA), ARIMA, Seasonal ARIMA, ETS, moving average, lagged predictors)
- Causal Impact Analysis: measure the real effect of an event on key metrics (propensity matching)
- Smart Hub: a configurable alert notification system to deliver ML alerts and insights in the UI
- Led Technical Design for a new internal ML platform to unify several training pipelines operating on billions of data points, enabling flexible feature selection, model parameter specifications, and model performance comparison
- Managed the full-stack Machine Learning team for a year, responsible for fostering career development, delivering feedback to team members, and growing the team as hiring manager
Moovweb · Software Engineer ( Feb 2013 – Apr 2015 )- Created an interface between a Go service and the libxml C library, allowing for runtime selection of a dispatch table for different libxml versions, improving modularity between system components
- Designed and implemented a backend daemon and worker pool queuing system to detect and capture changes in upstream HTML and CSS of client websites
- Wrote a package to convert Google’s XDM-style Gumbo parse tree to a custom DOM-like node class
Autonomy HP · Technology Specialist ( Aug 2012 – Feb 2013 )- Set up and maintained distributed networks of virtual machines for customer demonstrations
- Wrote code to automatically ingest file data into internal databases and visualize it effectively
UC Davis MIND Institute · Technical Research Assistant ( Jul 2011 – Aug 2012 )- Added several core features to the lab’s MATLAB codebase for preprocessing and statistical analysis of fMRI data, including the ability to run modern whole-brain and functional connectivity analyses on outdated datasets
- Developed a user-thresholded Fourier filter for MRI head motion data to detect and remove events surrounding motion spikes from analysis; integrated it as a preprocessing step for all brain scan data in the lab