Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Pages

Posts

6.8300 Final Project: Taming CLIP’s Captioning Bias: A COCO-Driven Analysis and Permutation Ensemble

19 minute read

Published:

Abstract: Vision-language models like CLIP struggle with multi-object scenes, often favoring prominent objects or those mentioned first in captions. Using real-world COCO images, we show that CLIP’s caption-matching accuracy drops from 91.23% to 87.45% when object order is reversed. To address this, we explore a post-hoc mitigation: a permutation ensemble that averages scores across all object orders, boosting robustness and recovering accuracy to 90.04%. Our findings reveal persistent order biases and offer a simple, effective strategy to improve CLIP’s reliability in complex scenes.

portfolio

publications

A Real-Time High-Precision Pedestrian Navigation Method, Device, and Related Components

Published in China National Intellectual Property Administration (CNIPA), 2024

My first granted patent (yay) that improves indoor pedestrian navigation without additional onsite hardware.

Recommended citation: Ren, J. (2024). A Real-Time High-Precision Pedestrian Navigation Method, Device, and Related Components. China National Intellectual Property Administration (CNIPA). Patent Number: CN112946323B.

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.