
Shane Zarechian
software
data
developer
UNH, Durham, New Hampshire
About
Computer Science student at the University of New Hampshire that likes to explore data science and engineering, ML, and fintech. Currently learning more about reinforcement learning, OpenCV, and clustering algorithms.
Experience
- -
Data Science Intern @ HouseNovel
Durham, NH - RemoteSummary:
- Developed OCR pipelines using PaddleOCR, Tesseract, and OpenAI API to digitize historic records; processed 85k+ pages and transformed 5M+ entries into structured data.
Responsibilities:
- Performed OCR on 85,000+ scanned historic documents using PaddleOCR, OpenAI API, and Tesseract
- Improved transcription accuracy by 40% by fine-tuning an open-source OCR engine
- Preprocessed 100+ GB of images for OCR extraction with OpenCV, scikit-image, and regex
- Grouped structural elements into 3 categories by implementing DBSCAN clustering on documents with scikit-learn
- Converted 5M+ unformatted entries into structured JSON using regular expressions, ingested results into PostgreSQL
- -
Backend Developer @ UNH Center for Business Analytics & NHADC
Durham, NH - HybridSummary:
- Enhanced scraping pipelines and backend architecture for NHADC using Python, multithreading, OpenAI API, and FastAPI microservices.
Responsibilities:
- Enhanced scraping and data processing speed by 50%+ by leveraging multithreading in Python
- Automated data extraction for 350+ Pydantic fields using the OpenAI API, enabling rapid analysis of raw HTML
- Refactored Streamlit app into FastAPI microservices across 16 endpoints and deployed on Render
- Implemented end-to-end Pytest suite with 100% coverage of API endpoints, ensuring stability and detecting bugs
- Contributed to backend development strategy and collaborated with a team of 4 to ensure efficient implementation
- -
Sales Representative @ Sunrun
Portsmouth, NH - On-siteSummary:
- Sales Representative responsible for establishing trust with clients, setting appointments for energy consultations, and connecting with hundreds of clients to build value in services.
Responsibilities:
- Established trust and served as first point of contact for clients to set appointments for energy consultations
- Connected with hundreds of clients and established value in company services
- Reached monthly sales goals through effective communication
Projects
A fintech web platform enabling advanced stock screening through 1000+ query combinations
- Utilized Django, Pandas, PostgreSQL, Bootstrap, and multithreading to create and maintain www.quantscope.io, a finance website offering advanced stock screening through 1,000+ query combinations
- Optimized PostgreSQL query performance and reduced LCP by 70% by performing table joins
- Processed 10+ GB of raw data from SEC and Yahoo Finance APIs into financial metrics with Pandas and Polars
- Reduced Time to First Byte to less than 200ms by deploying site with Railway and Cloudflare
- Python
- Django
- PostgreSQL
- Bootstrap
- Pandas
- Polars
- HTML5
- Railway
- Supabase
- Cloudflare
A central hub for UNH sustainability data, awarded at UNH ISE Symposium
- Led a team of 4 to develop a unified data hub for UNH faculty and students, earning 2nd place in the Infrastructure category at the UNH Interdisciplinary Science & Engineering Symposium (ISE)
- Automated generation of JSON chart specifications for 100+ sustainability datasets by utilizing Ollama LLMs
- Managed the team's GitLab repository, completing 50+ issues and ensuring accurate progress tracking
- Integrated Docker, Apache Superset, and Azure to achieve reproducibility and usability
- Apache Superset
- Metabase
- AWS
- Azure
- GitLab
- Docker
- Kubernetes
- Ollama
ETL pipelines and ML forecasting models for multi-source financial data
- Fetched and processed 50+ GB of economic and financial time series data from APIs and custom Selenium scrapers
- Orchestrated ETL pipelines with 20+ automated jobs scheduled in Dagster to update databases daily
- Optimized Sharpe ratio by running 1M+ backtest trials in under 15 seconds with NumPy and Numba
- Improved DuckDB and TimescaleDB SQL query performance by 10× by denormalizing table schemas
- Performed sentiment analysis on 100k+ news articles using transformer classification models from Hugging Face
- Tuned hyperparameters for PyTorch and XGBoost models over 5,000+ Optuna trials using news and social media data
- Reduced debugging time by 40% by implementing logging and live alerts via Discord API, SMS, and Loguru
- Python
- Dagster
- DuckDB
- TimescaleDB
- Airbyte
- NumPy
- Numba
- PyTorch
- Optuna
- scikit-learn
A 2D multiplayer tank game with health, currency, respawn, and powerup systems
- Utilized Java and Spring to develop an interface for users to control tanks and play multiplayer in a 2D game
- Implemented efficient systems for currency, health, multiplayer, respawn, and three types of power-ups
- Leveraged GitLab to manage team communication, track issues, and deliver three development epics
- Java
- Spring
- GitLab
Education
University of New Hampshire
Computer Science
Skills
- Python
- Pandas
- SQLAlchemy
- DuckDB
- BeautifulSoup
- FastAPI
- Polars
- NumPy
- PostgreSQL
- Django
- Flask
- Dagster
- Selenium
- CSS
- Pydantic
- Git
- Pytest
- HTML5
- Apache Superset
- Docker
- Ollama
- OpenAI API
- Plotly
- Anaconda
- Metabase
- Bootstrap
- Numba
- Jupyter
- Railway
- OpenCV
- PaddlePaddle
- Airbyte
- Astro
- GitLab
- PyTorch
- scikit-Learn
- Azure
- Render
- JSON
- Apache Parquet
- Linux
- Bash
- PyPy
- PyPI
- Optuna
- Loguru
- Grafana
- MongoDB
- Supabase
- PyCharm
- DataGrip
- IntelliJ
- Discord API
- Java
- Kubernetes
- AWS
- Spring
- Linear
- Clickhouse
- Cloudflare