Skip to content
View Rohan-Joseph-2002's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report Rohan-Joseph-2002

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Rohan-Joseph-2002/README.md

Hi, I'm Rohan! πŸ‘‹

πŸ”¬ Predoctoral Researcher, Ellison Institute of Technology Oxford, Limited
πŸ“Š BSc Statistics, University of British Columbia (UBC)
🧠 Applied Econometrics β€’ Statistical Modelling β€’ Data Systems

πŸ’Ό LinkedIn β€’ πŸ“š Research β€’ πŸ“„ CV β€’ βœ‰οΈ Email


I work at the intersection of economics, data science, and research engineering, designing end-to-end systems for empirical analysis.

I focus on problems where measurement is difficult, data is imperfect, and decisions depend on credible empirical structure.

My work starts with large, messy, or fragmented data, moves through data engineering and dataset construction, and culminates in papers, structured evidence, or decision-ready outputs.

Across projects, I prioritise measurement, scalability, and empirical designs that make complex systems interpretable.

πŸ” Research & Technical Interests

  • Healthcare Productivity, Access, and Inequality
  • Labour Markets, AI Diffusion, and Technology Adoption
  • Bayesian Forecasting and Model Uncertainty
  • Privacy, Disclosure, and Data Infrastructure
  • NLP, Entity Resolution, and Document-Level Extraction

πŸ”„ Workflow

Raw / Unstructured Data
        ↓
Data Collection & Scraping
        ↓
Cleaning & Normalisation
        ↓
Dataset Construction
        ↓
Modelling & Analysis
        ↓
Papers β€’ Insights β€’ Decision Outputs

βš™οΈ Profile

Area Focus
Current role Predoctoral Researcher in Economics, Ellison Institute of Technology Oxford, Limited
What I build End-to-end data pipelines, analytical datasets, and paper-linked empirical outputs
Methods Panel data, fixed effects, Bayesian state-space models, forecasting, NLP, entity resolution
Tools Python, R, SQL, Spark, Stata, Git, LaTeX
Scale 100k+ documents β€’ 300M+ records

⭐ Featured Work

πŸ“„ Independent Research
Public archive of research papers with linked codebases.

πŸ€– AI Occupation Adoption Gap Analysis
Empirical analysis of the gap between theoretical AI exposure and observed workplace adoption.

πŸ“‰ Bayesian Unemployment Forecasting Analysis
State-space modelling of unemployment with a focus on structural trends and predictive uncertainty.

πŸ•ΈοΈ Darkweb Text Analysis Pipeline
Large-scale text processing system for forum and marketplace data.

🧩 SDK Release Notes Pipeline
A workflow for collecting, parsing, and normalising SDK release notes from GitHub and vendor-hosted changelogs.

πŸ›‘οΈ CVE Vulnerability Exposure Pipeline
Integrated dataset combining NVD, CVE, EPSS, and KEV for vulnerability risk analysis.

πŸ“ Selected Papers

🧭 Where To Start

Research / Predoc
β†’ Independent Research
β†’ AI Occupation Adoption Gap Analysis
β†’ Bayesian Unemployment Forecasting Analysis

Data Science / Analytics
β†’ CVE Vulnerability Exposure Pipeline
β†’ Finance Job Postings Pipeline
β†’ Privacy Disclosures Pipeline

Research Engineering / NLP
β†’ Darkweb Text Analysis Pipeline
β†’ SDK Release Notes Pipeline
β†’ Analyst Report Entity Resolution

πŸ—‚οΈ Reproducibility

Many projects rely on licensed or restricted datasets. Where raw data cannot be shared, I prioritise:

  • Transparent pipeline structure
  • Clear schema and transformation logic
  • Reproducible workflows with sample or public inputs

The goal is to make methods and reasoning inspectable, even when data is not.

Pinned Loading

  1. independent-research-papers independent-research-papers Public

    Research papers on AI diffusion, labour markets, and Bayesian forecasting, with links to the underlying code.

  2. ai-occupation-adoption-gap-analysis ai-occupation-adoption-gap-analysis Public

    Public-data analysis of the gap between AI capability exposure and observed workplace use across occupations.

    Python

  3. bayesian-unemployment-forecasting-analysis bayesian-unemployment-forecasting-analysis Public

    Bayesian state-space analysis of U.S. unemployment forecasting, model comparison, and predictive uncertainty.

    R

  4. cve-vulnerability-exposure-pipeline cve-vulnerability-exposure-pipeline Public

    Official-source pipeline merging NVD, CVE, EPSS, and KEV data into a compact vulnerability exposure dataset.

    Python

  5. privacy-disclosures-pipeline privacy-disclosures-pipeline Public

    Pipeline for standardizing and harmonizing iOS privacy labels and Android safety disclosures into a shared analytical dataset.

    Python

  6. sdk-release-notes-pipeline sdk-release-notes-pipeline Public

    Collects and standardizes SDK release notes from GitHub and vendor changelogs into a reusable dataset.

    Python