Rohan Joseph Rohan-Joseph-2002

Hi, I'm Rohan! 👋

🔬 Predoctoral Researcher, Ellison Institute of Technology Oxford, Limited
📊 BSc Statistics, University of British Columbia (UBC)
🧠 Applied Econometrics • Statistical Modelling • Data Systems

💼 LinkedIn • 📚 Research • 📄 CV • ✉️ Email

I work at the intersection of economics, data science, and research engineering, designing end-to-end systems for empirical analysis.

I focus on problems where measurement is difficult, data is imperfect, and decisions depend on credible empirical structure.

My work starts with large, messy, or fragmented data, moves through data engineering and dataset construction, and culminates in papers, structured evidence, or decision-ready outputs.

Across projects, I prioritise measurement, scalability, and empirical designs that make complex systems interpretable.

🔍 Research & Technical Interests

Healthcare Productivity, Access, and Inequality
Labour Markets, AI Diffusion, and Technology Adoption
Bayesian Forecasting and Model Uncertainty
Privacy, Disclosure, and Data Infrastructure
NLP, Entity Resolution, and Document-Level Extraction

🔄 Workflow

Raw / Unstructured Data
        ↓
Data Collection & Scraping
        ↓
Cleaning & Normalisation
        ↓
Dataset Construction
        ↓
Modelling & Analysis
        ↓
Papers • Insights • Decision Outputs

⚙️ Profile

Area	Focus
`Current role`	Predoctoral Researcher in Economics, Ellison Institute of Technology Oxford, Limited
`What I build`	End-to-end data pipelines, analytical datasets, and paper-linked empirical outputs
`Methods`	Panel data, fixed effects, Bayesian state-space models, forecasting, NLP, entity resolution
`Tools`	Python, R, SQL, Spark, Stata, Git, LaTeX
`Scale`	100k+ documents • 300M+ records

⭐ Featured Work

📄 Independent Research
Public archive of research papers with linked codebases.

🤖 AI Occupation Adoption Gap Analysis
Empirical analysis of the gap between theoretical AI exposure and observed workplace adoption.

📉 Bayesian Unemployment Forecasting Analysis
State-space modelling of unemployment with a focus on structural trends and predictive uncertainty.

🕸️ Darkweb Text Analysis Pipeline
Large-scale text processing system for forum and marketplace data.

🧩 SDK Release Notes Pipeline
A workflow for collecting, parsing, and normalising SDK release notes from GitHub and vendor-hosted changelogs.

🛡️ CVE Vulnerability Exposure Pipeline
Integrated dataset combining NVD, CVE, EPSS, and KEV for vulnerability risk analysis.

📝 Selected Papers

🧭 Where To Start

Research / Predoc
→ Independent Research
→ AI Occupation Adoption Gap Analysis
→ Bayesian Unemployment Forecasting Analysis

Data Science / Analytics
→ CVE Vulnerability Exposure Pipeline
→ Finance Job Postings Pipeline
→ Privacy Disclosures Pipeline

Research Engineering / NLP
→ Darkweb Text Analysis Pipeline
→ SDK Release Notes Pipeline
→ Analyst Report Entity Resolution

🗂️ Reproducibility

Many projects rely on licensed or restricted datasets. Where raw data cannot be shared, I prioritise:

Transparent pipeline structure
Clear schema and transformation logic
Reproducible workflows with sample or public inputs

The goal is to make methods and reasoning inspectable, even when data is not.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly