<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://junruren.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://junruren.com/" rel="alternate" type="text/html" /><updated>2026-05-08T14:29:02+00:00</updated><id>https://junruren.com/feed.xml</id><title type="html">Junru Ren</title><subtitle>Student at MIT</subtitle><author><name>Junru Ren 任俊儒</name><email>junru@computer.org</email></author><entry><title type="html">Jazz and Homesickness</title><link href="https://junruren.com/posts/2025/12/jazz-and-homesickness/" rel="alternate" type="text/html" title="Jazz and Homesickness" /><published>2025-12-31T00:00:00+00:00</published><updated>2025-12-31T00:00:00+00:00</updated><id>https://junruren.com/posts/2025/12/jazz-and-homesickness</id><content type="html" xml:base="https://junruren.com/posts/2025/12/jazz-and-homesickness/"><![CDATA[<p>On a cold January day in 2023 in Nashville, Tennessee, just before heading to the airport, I took a small detour to stop by Nashville Jazz Workshop. The venue was closed, but I stood outside for a moment, feeling as if a smooth piano melody were drifting through the air, quietly comforting my homesick heart. If this pairing of American jazz and my Chinese homesickness catches your attention, I invite you into a very personal story of mine, one that intertwines the life journey of an international student with a particular American jazz pianist. Happy 2026, from my home in Shenzhen, China.</p>

<hr />

<h2 id="first-tastes-of-homesickness">First Tastes of Homesickness</h2>

<p>How I decided to study abroad in the United States is a story in itself and probably warrants a separate post.</p>

<p>My study-abroad journey unfolded quite smoothly in sunny San Diego. I quickly made friends, adapted to the local lifestyle, and became active in various community activities. Outside of school, I was blessed with <a href="https://convoydistrict.com">San Diego’s easy access</a> to groceries and cuisines from back home whenever I craved them. My family was also just one video call away, and we caught up almost every weekend.</p>

<p>For the first time in my life, I felt truly independent and empowered. Life was great in San Diego.</p>

<p>I did not really feel homesick, until two things happened.</p>

<p>At the end of my freshman year, everyone living on campus had to vacate their dorm by the Sunday following finals week. I had sublet a place for the summer, and my only task was to move my belongings from my dorm room to the new apartment, a short ten-minute drive away. While I was licensed to drive, I did not own a car at the time, and being underage meant that I could only use whatever Zipcar offered on campus. I reserved a Toyota Prius for the day.</p>

<p><img src="/images/2025-12-31-jazz-and-homesickness/Inside-Move-Out-Prius.JPG" alt="Inside the Prius" /></p>
<blockquote>
  <p>Inside the Prius I rented. I am surpised that I actually took a picture back then.</p>
</blockquote>

<p>On move-out day, the residential area felt more crowded than when school was in session. Family members of local students showed up in force: parents, grandparents, and siblings arrived in minivans or pickup trucks, equipped with flatbed wagons and sturdy Home Depot boxes, ready to get the job done quickly. I had a lovely interaction with my roommates’ family, and before I knew it, their move-out was complete.</p>

<p><img src="/images/2025-12-31-jazz-and-homesickness/Dorm-Vacated.jpg" alt="Dorm room vacated" /></p>
<blockquote>
  <p>My vacated dorm room, snapped on Snapchat (ah the Snapchat years…)</p>
</blockquote>

<p>Meanwhile, I had just bought a folding hand truck dolly cart. Together with the Prius, it meant making multiple trips between my dorm and the new apartment. While driving off campus, I spotted another international-student friend who was about to move their luggage by taking a local bus. I offered a ride.</p>

<p>By sundown, I had paused my own move and given ride after ride to friends who needed help. My Prius was used to its fullest capacity. Looking back now, I am fairly certain that the nineteen-year-old version of me could have packed and managed time more efficiently. My hauling was not finished until well past midnight. I had not eaten dinner, so I drove to a Subway and took my first bite of the night. It was at that moment, with a foot-long sub in hand at 1 a.m., that I felt the “curse” of independence: I did not have my parents in a minivan. I was on my own.</p>

<p>The other moment of missing home also came toward the end of my freshman year. I suffered a severe bout of food poisoning and was eventually taken by ambulance to the emergency room. After spending the night in the hospital by myself, I grabbed an Uber and went straight to my first lecture of the morning.</p>

<p><img src="/images/2025-12-31-jazz-and-homesickness/ER.JPG" alt="Inside UC San Diego Health ER" /></p>
<blockquote>
  <p>Inside UC San Diego Health ER, my first taste of American healthcare.</p>
</blockquote>

<blockquote>
  <p><em>Sidebar</em>: I recently listened to <a href="https://fortune.com/2025/12/05/jensen-huang-nine-year-old-janitor-before-nvidia-joe-rogan-podcast/">Jensen Huang’s story</a>, in which he described communicating with his parents through a tape cassette. The story is deeply touching, and it reminded me not to take the support I have for granted.</p>
</blockquote>

<p>I learned that vulnerable moments make us cherish the love we had previously taken for granted. Although I was initially proud of how easily I had adapted to life abroad, that sense of triumph faded as soon as the first real setback hit.</p>

<p>After my freshman year, whenever I flew home to reunite with my family, <strong>I began searching for something random yet tangible that could remind me of home</strong>.</p>

<h2 id="trans-pacific-commutes">Trans-Pacific Commutes</h2>

<p>Between my hometown of Shenzhen and my college town of San Diego, flights operated by Cathay Pacific between Hong Kong and Los Angeles became my go-to route for holidays and returns to school. After those early homesick moments, I started associating a Cathay Pacific flight with going back home (regardless of the direction of a flight).</p>

<p>On board a home-bound flight in my sophomore year, smooth jazz piano flowed through the in-flight entertainment system into my headset Curious, I tapped through the IFE screen in front of me. The artist was Beegie Adair.</p>

<p>Elegant, melodic, sophisticated, yet approachable. That was Beegie’s musical style.</p>

<p>The album playing was <a href="https://beegieadair.com/album/338049/cocktail-party-jazz-2">“Beegie Adair &amp; Friends: Cocktail Party Jazz 2”</a>. Some might label it “lounge music” (and it certainly can be), but that first immersion carried me through a quietly sentimental journey <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<p><img src="/images/2025-12-31-jazz-and-homesickness/IFE-with-Beegie.jpeg" alt="Actual photo I took of the IFE screen with the exact album" /></p>
<blockquote>
  <p>Actual photo I took of the IFE screen with the exact album. I cannot believe I actually took this photo on that flight. (Well, I sure took it so that I could search for the album after landing.)</p>
</blockquote>

<p>By the time the wheels touched down, the same album was still rolling on my IFE.</p>

<p><strong>Random but tangible</strong>: from that point on, hearing Beegie Adair’s music became associated, in my mind, with being on board a Cathay flight, and being on board a Cathay flight became associated with going home. It is difficult to explain how my mind formed this two-stage connection, though I likely reinforced it by deliberately replaying the same album on subsequent Cathay flights, regardless of the direction of travel. The impression from that first listen was simply too strong.</p>

<h2 id="into-jazz">Into Jazz</h2>

<p>Jazz, in its broadest sense, was part of my childhood soundscape. My dad often played Kenny G in the car, which nudged me toward learning a woodwind instrument (clarinet, not Kenny’s sax, somehow), and I was trained classically. I never actively sought out more jazz, but I welcomed it whenever I encountered it.</p>

<p>That changed after the inexplicable establishment of my Beegie’s jazz == homesickness cure logic. Through her recordings, I began to explore jazz standards and the Great American Songbook. I had entered a new musical world.</p>

<p>After college, I was fortunate to join a company full of musicians who loved jazz. We jammed in the office music room every Tuesday and Thursday after work. It remains my best work memory.</p>

<p>One quiet wish stayed with me: to hear Beegie perform live one day. I knew she played regularly at <a href="https://www.nashvillejazz.org">Nashville Jazz Workshop</a>, where she was also a founding board member. When my post-graduate life finally allowed for more travel after a year of work, the COVID-19 pandemic arrived, pushing my Nashville plans indefinitely into the future.</p>

<p>Through the thick of lockdowns and my own medical concerns, I eventually reached a point where I felt ready to look up Beegie’s next performance. Instead, I found a solemn line on <a href="https://www.beegieadair.com">https://www.beegieadair.com</a>:</p>

<blockquote>
  <p>January 23, 2022: It is with deep and profound sadness we share the sad news that Beegie died today surrounded by those she loved and who loved her most dearly…</p>
</blockquote>

<h2 id="a-tribute">A Tribute</h2>

<p>My first visit to Nashville finally came in January 2023. That trip had more nuance that I may touch on in a future reflection, but amid it, I was in Nashville with a sense of regret that I could never close my “jazz-homesick loop” by hearing Beegie play live anymore.</p>

<p>On my Uber ride to the airport, I added Nashville Jazz Workshop as a stop. I stood outside the building for a few seconds, took a selfie, and then returned to my airport-bound ride. I put on my headphones and pressed play on <em>Cocktail Party Jazz 2</em>.</p>

<p><img src="/images/2025-12-31-jazz-and-homesickness/Nashville-Jazz-Workshop.jpeg" alt="Selfie taken outside Nashville Jazz Workshop" /></p>
<blockquote>
  <p>Selfie taken outside Nashville Jazz Workshop</p>
</blockquote>

<hr />

<p>Today, we each consume information on the order of gigabytes per day. Countless random details pass through our lives. Yet somehow, just somehow, I formed an unexplainable set of connections among a few seemingly unrelated random things: a musician, an airline, and a feeling of homesickness. Over time, that linkage quietly amplified its own significance in my mind.</p>

<p>I appreciate you reading this far because you probably bore the desire to shout: how on earth did I come to make these connections… Well, I would ask that question of myself too. But here I am: I’ve invented and kept within me a unique embodiment of some very layered emotions and memories, beyond the economy cabin of a trans-Pacific flight carrying an anxious international student away from home.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>A small pun intended: “Sentimental Journey” is another jazz standard that Beegie also performed. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Junru Ren 任俊儒</name><email>junru@computer.org</email></author><category term="personal" /><category term="music" /><summary type="html"><![CDATA[On a cold January day in 2023 in Nashville, Tennessee, just before heading to the airport, I took a small detour to stop by Nashville Jazz Workshop. The venue was closed, but I stood outside for a moment, feeling as if a smooth piano melody were drifting through the air, quietly comforting my homesick heart. If this pairing of American jazz and my Chinese homesickness catches your attention, I invite you into a very personal story of mine, one that intertwines the life journey of an international student with a particular American jazz pianist. Happy 2026, from my home in Shenzhen, China.]]></summary></entry><entry><title type="html">What is it like to do AI research inside Nike?</title><link href="https://junruren.com/posts/2025/12/ai-research-inside-nike/" rel="alternate" type="text/html" title="What is it like to do AI research inside Nike?" /><published>2025-12-20T00:00:00+00:00</published><updated>2025-12-20T00:00:00+00:00</updated><id>https://junruren.com/posts/2025/12/AI-Research-inside-Nike</id><content type="html" xml:base="https://junruren.com/posts/2025/12/ai-research-inside-nike/"><![CDATA[<p>I’m feeling bittersweet because yesterday, December 19, 2025, was my last day at Nike. My six-month research internship culminated in three presentations across multiple time zones. While the details of the work will eventually show up in my MIT thesis next year, here I reflect on this unique journey of conducting AI research at a world-renowned brand.</p>

<p><img src="/images/2025-12-20-AI-Research-inside-Nike/Nike-Campus-Last-Day.jpeg" alt="Amazing sunset over Lake Nike" /></p>
<blockquote>
  <p>On my last day, the rainy Oregon sky cleared up just in time for a stunning sunset over Nike’s <a href="https://about.nike.com/en/newsroom/releases/nike-philip-h-knight-campus-announcement">Philip H. Knight Campus</a></p>
</blockquote>

<h2 id="the-dual-mandate">The Dual Mandate</h2>

<p><strong>You must think about complicated business needs and state-of-the-art research at the same time.</strong> At a company like Nike, the first priority is being intentional about how the work solves real problems and creates wins. For example, if I get too enamored with a valid problem in academia that’s not yet relevant to the business, it might not create immediate value. This is exactly why I’m grateful to bring both my MIT EECS and MBA lenses to the job.</p>

<p><strong>Networking is a must.</strong> In any huge company, knowledge of a complex business process is naturally distributed. The most effective way (in my experience) to connect the dots is to form collaborative partnerships with teammates who have the subject-matter expertise. Map out everything, put it into one complete document, invite feedback, and iterate.</p>

<p><strong>Literature review starts on day one.</strong> When your research takes the form of an industry internship, it’s easy to lose sight of what’s happening in academia, especially in the midst of all the networking above. I’m grateful my thesis advisors constantly reminded me to find the papers that back up what I’m seeing (or claiming). There’s no real substitute for shuttling between “professional work” and “research.” This context switching happens basically every other day.</p>

<h2 id="ai-at-nike">AI at Nike</h2>

<p>From the outside, Nike is not likely associated with AI research. However, the company is deeply invested in leveraging AI to enhance customer experiences, optimize supply chains, and innovate product designs. During my internship, I witnessed firsthand how AI is integrated into various facets of Nike’s operations. One recent example that the public may be familiar with is the <a href="https://www.thestack.technology/nikes-cto-welcomes-nikeai-rollout-are-there-lessons-here-for-developers/">AI search feature on the Nike app</a>, which allows users to find products in natural language.</p>

<p><strong>My first taste of an enterprise AI platform.</strong> Nike is the very first large enterprise that I have ever worked at (naturally given that my only previous full-time endeavor was at an AI startup), and I truly didn’t know much about how data is managed at scale and whether there is an established platform on which AI solutions can be developed.</p>

<p>Through this internship, I got my hands on Databricks for the first time. On the surface, Databricks is a data lakehouse platform that unifies data engineering, data science, and business analytics. But quickly I realized that it is also a powerful AI platform that allows researchers and engineers to build, deploy, and monitor machine learning models at scale. From a simple regression model to access to nearly all major large language models (LLMs) via API, Databricks provides a seamless environment for AI research and development within a large organization like Nike.</p>

<p>Need to embed some text data? Databricks has you covered with various embedding models. Done training your model? You can easily deploy it on Databricks and create an endpoint for real-time predictions. My amazement grew as I saw how Databricks quickly made a new model like GPT-5.2 available as soon as OpenAI released it. And more importantly, all of these models are accessed in a secure and compliant manner, which is crucial for enterprise applications.</p>

<p>When I was just a student/individual builder outside of big companies, it rarely occurred to me how important internal data governance is; sharing my whole life’s troubles with the public instance of ChatGPT.com seems like the way to go. (Well, I shouldn’t…)</p>

<p><strong>I was also blessed with a supportive environment.</strong> From my day-to-day data science teammates to the VP of the customer organization that my research serves, everyone was incredibly supportive and encouraging. They provided me with the resources, mentorship, and feedback needed to thrive in this internship. Over the past six months, I interacted with a far more diverse set of functions than I would have at a purely technical company. In AI, my technical mentors are all hands-on builders who ship models or agentic tools every day, and my business function partners are genuinely excited about applying AI to solve their real problems.</p>

<h2 id="cool-company">Cool Company</h2>

<p>If you know Nike, you know it’s a cool company. The brand is iconic, the culture is vibrant, and the people are passionate about what they do—and about sports. I warmed up with <a href="https://www.facebook.com/share/v/1H299taBo3/">Eliud Kipchoge before a 5K on campus</a>, and I’ve made sports a daily habit by frequenting three different on-site gyms. Running, cycling, swimming, repeat. I grew up with the Swoosh too, so it’s kind of magical to work behind a brand that’s instantly recognized around the world. I also joined during an important turning point as the company sharpens its focus on winning again, winning now, so it was valuable to witness a major transition unfold—and to watch how people from senior leadership to every individual contributor react and adapt.</p>

<p>If you ask me what the coolest experience during my internship was, I’d say it was the <strong>Shoe School</strong>, where I got to hand-make my own shoe from scratch! Sadly, it was just one for the left foot, because the teaching materials were ordered in batches such that each cohort gets to make either a left shoe or a right shoe. Still, it was an unforgettable experience to understand the craftsmanship and technology that goes into making a Nike shoe.</p>

<p><img src="/images/2025-12-20-AI-Research-inside-Nike/Shoe-School.jpeg" alt="I made a shoe!" /></p>

<p>At Nike, there is also a <strong>unique MIT community.</strong>  My research internship at Nike was part of the MIT LGO program, and this <a href="https://lgo.mit.edu/partnercompanies/nike-inc/">~15-year partnership</a> has created a strong alumni network on campus. I was lucky to have this community’s support: both for career development and for fun activities.</p>

<p>And yes, I’ll admit it: I’ve collected a dangerous number of Nike shoes (free or at huge discounts, thanks to the employee perks). But hey, I’m running comfortably in my brand new Vomero Plus, and my knees will be happier.</p>

<p><img src="/images/2025-12-20-AI-Research-inside-Nike/Shoe-Boxes.jpeg" alt="My Nike Shoe Boxes" /></p>

<hr />

<p>Well, finally, a teaser on my research: <strong>a multi-agent buyer–seller negotiation framework under limited information</strong>…</p>

<p>Enough said. <a href="/posts/2025/08/MIT-Thesis-LaTeX/">LaTeX time</a>…</p>]]></content><author><name>Junru Ren 任俊儒</name><email>junru@computer.org</email></author><category term="research" /><category term="ai" /><category term="internship" /><category term="nike" /><category term="mit" /><summary type="html"><![CDATA[I’m feeling bittersweet because yesterday, December 19, 2025, was my last day at Nike. My six-month research internship culminated in three presentations across multiple time zones. While the details of the work will eventually show up in my MIT thesis next year, here I reflect on this unique journey of conducting AI research at a world-renowned brand.]]></summary></entry><entry><title type="html">Stolen Name: How Software Erases Identity</title><link href="https://junruren.com/posts/2025/11/stolen-name/" rel="alternate" type="text/html" title="Stolen Name: How Software Erases Identity" /><published>2025-11-04T00:00:00+00:00</published><updated>2025-11-04T00:00:00+00:00</updated><id>https://junruren.com/posts/2025/11/Stolen-Name</id><content type="html" xml:base="https://junruren.com/posts/2025/11/stolen-name/"><![CDATA[<p>Names are one of the first things a system asks for, and one of the easiest ways it reveals what culture it was built for. This post looks at how seemingly harmless assumptions in software can quietly erase identity, using Chinese names as the primary case study.</p>

<p><img src="/images/2025-11-04-Stolen-Name/Cover_by_ChatGPT.png" alt="Cover image by ChatGPT" /></p>

<hr />

<p>In my junior years as a software engineer designing APIs to relay user information for voice AI, my mentor, <a href="https://www.ellipsix.net/index.html">David</a>, shared a classic post: <a href="https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/"><em>Falsehoods Programmers Believe About Names</em></a>. It was humorously eye-opening at the time. My favorite is the last one:</p>

<blockquote>
  <p><em>40. People have names.</em></p>
</blockquote>

<p>We often model “name” as a tidy schema, then watch reality break it.</p>

<p>As someone who has built user-facing systems, I notice how the design of something as small as a name field encodes an entire worldview. Details such as whitespace, commas, and capitalization become cultural choices with real human consequences.</p>

<h2 id="the-issue-when-given-names-have-spaces">The Issue: When Given Names Have Spaces</h2>

<p>Take my given name: <strong>Junru</strong>.</p>

<p>On my passport, my name is romanized as “Ren, Junru.” That works fine in most Western systems: two tokens, mapped cleanly to family and then given names. When a service greets me, I usually see “Hi Junru!” No drama.</p>

<p>In reality, my given name is formed by two characters, Jun and Ru, as is common for many Han Chinese names. Mainland documents romanize those two characters as one token (“Junru”). Outside mainland China, though, it is common to romanize with a space (<strong>“Jun Ru.”</strong>) or with a hyphen (<strong>“Jun-Ru”</strong>).</p>

<p>Here’s where the breakage starts. Many systems assume a first-space split and misinterpret the space as a boundary between first and middle names, illustrated in this simple Python snippet:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">first_name</span> <span class="o">=</span> <span class="n">full_name</span><span class="p">.</span><span class="n">split</span><span class="p">(</span><span class="s">" "</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">last_name</span>  <span class="o">=</span> <span class="n">full_name</span><span class="p">.</span><span class="n">split</span><span class="p">(</span><span class="s">" "</span><span class="p">)[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
</code></pre></div></div>

<p>Neat. Deterministic. And, for “Jun Ru Ren,” it yields <code class="language-plaintext highlighter-rouge">first_name = "Jun"</code> and <code class="language-plaintext highlighter-rouge">last_name = "Ren"</code>, silently truncating half of the given name. As a result, one might be greeted as “Hi Jun!” instead of “Hi Jun Ru!” When others look up this name in a directory, they might see “Jun Ren”. A small bug becomes a small erasure.</p>

<p>This happens to friends who grew up in North America with given names romanized as two words. The downstream effects are social as much as technical: <strong>some eventually go by the truncated first token</strong>; others adopt an English name to avoid constant friction. Technology didn’t force the choice, but it nudged it.</p>

<h2 id="beyond-chinese-names-a-global-pattern">Beyond Chinese Names: A Global Pattern</h2>

<p>The point isn’t that Chinese names are uniquely tricky; it’s that a single schema can’t represent the world:</p>

<ul>
  <li>Some Indonesians and Burmese people have mononyms: no family name at all.</li>
  <li>Icelandic names are primarily patronymic/matronymic, not stable family surnames.
    <ul>
      <li>A favorite example: the Icelandic-Chinese jazz superstar <a href="https://en.wikipedia.org/wiki/Laufey_(singer)">Laufey</a>. Her full name is <strong>Laufey Lín Bing Jónsdóttir</strong> where <strong>Jónsdóttir = Jón + s + dóttir</strong>, meaning “daughter of Jón.” Her Chinese name 林冰 (Lin, Bing) is proudly included, with Lin being her mother’s family name. Interestingly, her twin sister <strong>Júnía Lín Hua Jónsdóttir</strong> appears as “Júnía Lin” in production credits.</li>
    </ul>
  </li>
  <li>Many Spanish-speaking cultures use two family names (paternal and maternal).</li>
  <li>In parts of India, name order and presence of a family name vary widely by region and language.</li>
</ul>

<p>Yet many forms still require “First Name” and “Last Name,” full stop.</p>

<p>A helpful resource for developers trying to do better is the W3C write-up, <a href="https://www.w3.org/International/questions/qa-personal-names">Personal Names Around the World</a>. The short version: accept variability, avoid premature parsing, and don’t assume structure from whitespace.</p>

<h2 id="appendix-how-chinese-ids-treat-names-chinese-script">Appendix: How Chinese IDs Treat Names (Chinese Script)</h2>

<p>When names are written in Chinese characters, the segmentation problem mostly disappears. Across most local ID systems, the full name appears as a single string of characters without explicit “first/last” fields.</p>

<table>
  <thead>
    <tr>
      <th>Sample</th>
      <th>Remark</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><img src="https://upload.wikimedia.org/wikipedia/commons/e/e7/The_People%27s_Republic_of_China_resident_identity_card_%28SAMPLE%29.png" alt="Sample image of a Resident Identity card in Mainland China (Source: Wikimedia)" /></td>
      <td><strong>Mainland China</strong> Resident ID Card shows one “full name” (姓名) field with characters together. In the sample photo, look for “某某某” as the placeholder of a full name. Some ethnic minority formats differ; see <a href="https://en.wikipedia.org/wiki/Resident_Identity_Card#Additional_features_in_ethnic_minority_areas">Additional Features</a></td>
    </tr>
    <tr>
      <td><img src="https://upload.wikimedia.org/wikipedia/commons/f/f6/ROC_mibunsho.jpg" alt="Sample image of a ROC Resident Identity Card (Source: Wikimedia)" /></td>
      <td>ID in <strong>Taiwan</strong> likewise shows one name field (姓名). Spacing between characters is typographic, not a given/family separator.</td>
    </tr>
    <tr>
      <td><img src="https://upload.wikimedia.org/wikipedia/commons/4/47/Hong_Kong_ID_card_front_side.png" alt="Sample image of a Hong Kong Permanent Identity Card (Source: Wikimedia)" /></td>
      <td><strong>Hong Kong</strong>: the Chinese line shows characters together; the English line separates with a comma between family and given names.</td>
    </tr>
    <tr>
      <td><img src="https://upload.wikimedia.org/wikipedia/commons/b/bf/MacaoID2023.jpg" alt="Sample image of a Macau Permanent Identity card (Source: Wikimedia)" /></td>
      <td><strong>Macau</strong> is a notable case where the card’s design explicitly separates family and given name fields.</td>
    </tr>
  </tbody>
</table>

<p>These examples (Macau aside) illustrate a key idea: in the native script, names are treated as a single unit; the friction arises during romanization and in systems that overfit to another schema.</p>

<h2 id="flip-the-table">Flip the Table</h2>

<p>We’ve looked at what happens when Chinese names meet systems designed around Anglo‑American conventions. Now imagine the reverse: a non‑Chinese name forced into a Chinese schema with no space‑based segmentation, or an interface that insists on family‑name‑first without a clear family name to give. It could go both ways.</p>

<p>If you have stories, what worked, what broke, I’d love to learn from them.</p>

<hr />

<h2 id="closing-thoughts">Closing Thoughts</h2>

<p>Whether in code or culture, the smallest design decisions shape how people see themselves. A name is not just data to be parsed; it’s a story, often written across languages and generations. When we build software, we are choosing which stories appear whole. Next time you see a field labeled “First Name”, pause and ask: whose first name?</p>]]></content><author><name>Junru Ren 任俊儒</name><email>junru@computer.org</email></author><category term="software" /><category term="culture" /><summary type="html"><![CDATA[Names are one of the first things a system asks for, and one of the easiest ways it reveals what culture it was built for. This post looks at how seemingly harmless assumptions in software can quietly erase identity, using Chinese names as the primary case study.]]></summary></entry><entry><title type="html">Networking Beyond Your Home Department: An MIT CS/AI Case Study</title><link href="https://junruren.com/posts/2025/10/mit-cs-ai-engagement/" rel="alternate" type="text/html" title="Networking Beyond Your Home Department: An MIT CS/AI Case Study" /><published>2025-10-14T00:00:00+00:00</published><updated>2025-10-14T00:00:00+00:00</updated><id>https://junruren.com/posts/2025/10/MIT-CS-AI-Engagement</id><content type="html" xml:base="https://junruren.com/posts/2025/10/mit-cs-ai-engagement/"><![CDATA[<p>How do you build real connections <strong>outside</strong> your home academic program or department, especially in a place like MIT? This post shares a simple playbook for doing exactly that, using my own path into the CS/AI community as a case study. The short version: find the right announcement surfaces, <strong>show up</strong> (even when you won’t understand everything), and keep the curiosity dial set to “loud.”</p>

<h2 id="case-study-setup-an-sms-identity-crisis">Case Study Set‑Up: An SM’s “Identity Crisis”</h2>

<p>When I first arrived at MIT as an <strong>EECS Master of Science (SM)</strong> and <strong>MBA dual-degree</strong> candidate, I had a brief identity crisis about where I fit.</p>

<p>Conversations with other students or faculty often went like this:</p>

<blockquote>
  <p><strong>Who’s your PI (principal investigator)?</strong><br />
I don’t have one. I’m a master’s student.<br />
<strong>Oh, so you’re an MEng student then?</strong><br />
No, I’m SM 😅.</p>
</blockquote>

<p>This happens because <strong>the majority of EECS graduate students are PhD students</strong>, followed by MIT undergrads continuing for their MEng. That makes being an EECS SM both rare and meaningful. In my case, I cleared multiple admission committees (three, in fact: <strong>EECS, Sloan MBA, and LGO</strong>) 😎.</p>

<p>With no lab affiliation, no RA-ship, and no matched thesis advisor (well, now I do 😉), I still wanted to engage with the <strong>CS and AI research communities at MIT</strong>.</p>

<p><em>I want to be in the room where it happens.</em></p>

<p><img src="https://media2.giphy.com/media/v1.Y2lkPTc5MGI3NjExdm5jYjdxZm96OGd3ajFkZG03YmY1aXhnZGlkNzU2dm9mdTVjcnRoNiZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/3CYopbqdiUv4xJf7T6/giphy.gif" alt="" /></p>

<hr />

<h2 id="the-playbook-how-to-network-outside-your-home-program">The Playbook: How to Network Outside Your Home Program</h2>

<h3 id="1-set-one-clear-objective">1. Set one clear objective</h3>

<p>I decided one of my main grad school objectives would be to <strong>attend as many research seminars, talks, and guest lectures as possible</strong>. That single choice simplified a lot of decisions later (see: calendar collisions and FOMO).</p>

<h3 id="2-find-the-room-where-it-announces">2. Find the “Room Where It <em>Announces</em>”</h3>

<p>Every community has places where information lands first: department‑wide digests, lab/center mailing lists, seminar calendars, student collectives, Slack/Discord, and newsletters.</p>

<p>Your job is to <strong>discover and subscribe</strong>. (My MIT‑specific examples are in the appendix.)</p>

<h3 id="3-go-to-the-rooms-where-it-happens">3. Go to the Rooms Where It Happens</h3>

<p>Zoom is useful, but the hallway chat, the post‑talk question, and the walk‑and‑talk to the elevator are where many connections start.</p>

<p>In my first fall semester, I <em>sometimes</em> skipped a class (yes, one with attendance) to catch a talk.</p>

<p>Make your own call, but don’t sleep on the compounding value of being in the room.</p>

<h3 id="4-give-yourself-permission-not-to-understand-everything">4. Give yourself permission <strong>not</strong> to understand everything</h3>
<p>Do I always understand the talks? Absolutely not.</p>

<p>But every time, I leave with something new: a topic to Google later, a name to follow, a feel for where the field is heading. Even if I can’t follow the proofs, I can still catch the pulse, and that alone is worth being in the room.</p>

<p>For example, in one seminar, I heard <a href="https://arxiv.org/abs/2407.06460">“Machine <strong>Unlearning</strong>”</a> for the first time! That’s mind-opening when I was overwhelmingly exposed to “learning”.</p>

<hr />

<p>So if you also feel “between the labs,” don’t wait for an official invitation. Just show up. Sit in. Be curious.</p>

<p>If you’ve found other ways to engage (clubs, reading groups, random Slack channels, or secret pizza seminars) I’d love to hear them.<br />
<a href="mailto:junru@computer.org">Drop me a note</a>, and maybe I’ll see you in <strong>one of those rooms where it happens</strong>.</p>

<hr />

<h2 id="appendix-mit-csai-links-examples">Appendix: MIT CS/AI Links (Examples)</h2>

<p class="notice"><strong>Disclaimer.</strong> All links below point to <strong>public pages</strong> already available via CSAIL’s TIG website or public org pages. I’m <strong>re‑summarizing</strong> them here as examples, not advocating misuse. Please be respectful of list policies and norms. If a CSAIL administrator would prefer these subscription links not be referenced here, email me and I’ll remove them promptly.</p>

<ul>
  <li><strong>CSAIL (MIT Computer Science &amp; Artificial Intelligence Laboratory)</strong>
    <ul>
      <li>CSAIL homepage: <a href="https://www.csail.mit.edu/">https://www.csail.mit.edu/</a></li>
      <li>TIG: Popular Lab Mailing Lists: <a href="https://tig.csail.mit.edu/email-communicating/popular-lab-mailing-lists/">https://tig.csail.mit.edu/email-communicating/popular-lab-mailing-lists/</a></li>
      <li>Note: some lists (e.g., <code class="language-plaintext highlighter-rouge">csail-all</code>, <code class="language-plaintext highlighter-rouge">csail-internal</code>) are restricted.</li>
    </ul>
  </li>
  <li><strong>Student &amp; Research Collectives</strong>
    <ul>
      <li>Scale ML (cross‑lab MIT AI student collective): <a href="https://scale-ml.org/">https://scale-ml.org/</a>.
        <ul>
          <li>Just last month, they hosted <a href="https://x.com/chhillee"><strong>Horace He</strong></a>, who co-authored <a href="https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/"><em>“Defeating Nondeterminism in LLM Inference”</em></a>, the very first blog post from Thinking Machines.</li>
        </ul>
      </li>
      <li>ML Tea (informal machine learning talks with tea and usually snacks 😋): <a href="https://projects.csail.mit.edu/ml-tea/">https://projects.csail.mit.edu/ml-tea/</a></li>
      <li>MIT NLP Meetings Seminar Series: <a href="https://mitnlp.notion.site/">https://mitnlp.notion.site/</a></li>
    </ul>
  </li>
  <li><strong>Context</strong>
    <ul>
      <li>MIT EECS Graduate Programs - Admission Process (why SM/MEng confusion happens):<br />
<a href="https://www.eecs.mit.edu/academics/graduate-programs/admission-process/">https://www.eecs.mit.edu/academics/graduate-programs/admission-process/</a></li>
    </ul>
  </li>
</ul>]]></content><author><name>Junru Ren 任俊儒</name><email>junru@computer.org</email></author><category term="mit" /><category term="networking" /><category term="grad-school" /><summary type="html"><![CDATA[How do you build real connections outside your home academic program or department, especially in a place like MIT? This post shares a simple playbook for doing exactly that, using my own path into the CS/AI community as a case study. The short version: find the right announcement surfaces, show up (even when you won’t understand everything), and keep the curiosity dial set to “loud.”]]></summary></entry><entry><title type="html">Cheatsheet for Your MIT Sloan Classes</title><link href="https://junruren.com/posts/2025/09/Cheatsheets/" rel="alternate" type="text/html" title="Cheatsheet for Your MIT Sloan Classes" /><published>2025-09-29T00:00:00+00:00</published><updated>2025-09-29T00:00:00+00:00</updated><id>https://junruren.com/posts/2025/09/MIT-Sloan-Cheatsheets</id><content type="html" xml:base="https://junruren.com/posts/2025/09/Cheatsheets/"><![CDATA[<p>Hi there. If you wind up on this page, I assume you’re busy studying for your upcoming MBA Core exams at MIT Sloan.</p>

<p>I was in your shoes in Fall 2024.</p>

<p>Originally, I just wanted to refamiliarize myself with LaTeX by typing up my study notes (instead of going to all the MBA parties, alas). Prior to MIT, I’d had success condensing everything into a single sheet of LaTeX, so I decided to do the same again; this time open-sourcing it for my fellow classmates.</p>

<p>So there you have it.</p>

<p class="notice"><strong>Caution</strong>: Your exam scope may differ from mine.</p>

<h2 id="mba-core">MBA Core</h2>

<ul>
  <li>15.010 Economic Analysis for Business Decisions (Fall 2024)
    <ul>
      <li><a href="https://github.com/junruren/MIT-Classes/blob/main/15.010/15_010_Midterm_Cheatsheet.pdf">Midterm</a></li>
      <li><a href="https://github.com/junruren/MIT-Classes/blob/main/15.010/15_010_Final_Cheatsheet.pdf">Final</a></li>
    </ul>
  </li>
  <li>15.515 Financial Accounting (Fall 2024)
    <ul>
      <li><a href="https://github.com/junruren/MIT-Classes/blob/main/15.515/15_515_H1_Cheatsheet.pdf">Midterm</a> — I didn’t make this one myself, so kudos to a legendary predecessor (please reach out if you made it!)</li>
      <li><a href="https://github.com/junruren/MIT-Classes/blob/main/15.515/15_515_Final_Cheatsheet.pdf">Final</a></li>
    </ul>
  </li>
</ul>

<h2 id="mba-core-elective">MBA Core Elective</h2>

<ul>
  <li>15.401 Managerial Finance (Spring 2025)
    <ul>
      <li><a href="https://github.com/junruren/MIT-Classes/blob/main/15.401/15_401_Midterm_Cheatsheet.pdf">Midterm</a></li>
      <li><a href="https://github.com/junruren/MIT-Classes/blob/main/15.401/15_401_Final_Cheatsheet.pdf">Final</a></li>
    </ul>
  </li>
</ul>

<h2 id="lgo-core">LGO Core</h2>

<ul>
  <li>15.086 Engineering Probability (Summer 2024)
    <ul>
      <li><a href="https://github.com/junruren/MIT-Classes/blob/main/15.086/15_086_Cheatsheet.pdf">Final</a></li>
    </ul>
  </li>
  <li>15.087 Engineering Statistics and Data Science (Summer 2024)
    <ul>
      <li><a href="https://github.com/junruren/MIT-Classes/blob/main/15.087/15_087_Cheatsheet.pdf">Final</a></li>
    </ul>
  </li>
</ul>

<hr />

<h2 id="tips-on-printing">Tips on Printing</h2>

<ol>
  <li>Ideally, find a color printer.</li>
  <li>Select the “Shrink to Fit” option in your printer settings.</li>
  <li>Select the highest possible print quality.</li>
</ol>

<p><em>Good luck!</em></p>

<hr />

<p>The best way to send me a “thank you note” is to <strong>🌟 Star</strong> my <a href="https://github.com/junruren/MIT-Classes/">GitHub repository</a> that hosts all these cheatsheets 😉 so… my thanks to you!</p>]]></content><author><name>Junru Ren 任俊儒</name><email>junru@computer.org</email></author><category term="mitsloan" /><category term="mit" /><category term="class" /><summary type="html"><![CDATA[Hi there. If you wind up on this page, I assume you’re busy studying for your upcoming MBA Core exams at MIT Sloan.]]></summary></entry><entry><title type="html">Tutorial: MIT Thesis LaTeX Template with VS Code</title><link href="https://junruren.com/posts/2025/08/MIT-Thesis-LaTeX/" rel="alternate" type="text/html" title="Tutorial: MIT Thesis LaTeX Template with VS Code" /><published>2025-08-04T00:00:00+00:00</published><updated>2025-08-04T00:00:00+00:00</updated><id>https://junruren.com/posts/2025/08/Tutorial-MIT-Thesis-LaTeX</id><content type="html" xml:base="https://junruren.com/posts/2025/08/MIT-Thesis-LaTeX/"><![CDATA[<p>Let’s set up the MIT thesis LaTeX template locally!</p>

<p><img src="/images/2025-08-04-Tutorial-MIT-Thesis-LaTeX/Cover_by_ChatGPT.png" alt="Cover Photo by ChatGPT" /></p>
<blockquote>
  <p>Generated by GPT-4o. Prompt: <em>A playful illustration showing a happy MIT student proudly holding a printed MIT thesis in one hand and a laptop showing VS Code with LaTeX code in the other. In the background, icons representing LaTeX, PDF documents, and MIT’s iconic Great Dome subtly float around. The student stands triumphantly on top of a stack of neatly bound theses. The style is colorful, lighthearted, and cartoonish.</em></p>
</blockquote>

<p>MIT’s libraries require theses to be deposited electronically using a strict format. To simplify formatting, the <a href="https://web.mit.edu/thesis/tex/"><strong>MIT thesis LaTeX template</strong></a> provides a class (<code class="language-plaintext highlighter-rouge">mitthesis.cls</code>) and a set of sample files that implement these requirements.</p>

<p>This tutorial blog builds on top of my previous <a href="/posts/2025/06/LaTeX-VSCode/">“Tutorial: Use LaTeX Locally with VS Code”</a>. By following that tutorial, you should already have:</p>

<ul>
  <li><strong>Visual Studio Code with LaTeX Workshop installed.</strong> The extension provides core LaTeX features such as auto‑building to PDF, an integrated PDF viewer, SyncTeX navigation, IntelliSense, and log parsing. It automatically runs the sequence of tools needed to build your document and highlights errors in the editor.</li>
  <li><strong>TeX Live 2022 or newer.</strong> The MIT thesis class requires a recent LaTeX distribution; as of the <a href="https://ctan.org/ctan-ann/id/aYBqHvP6SlO4x7oL%40prptp">CTAN v1.22 release</a> dated January 31, 2026, formats before June 2022 are no longer supported. A full TeX Live installation includes <code class="language-plaintext highlighter-rouge">pdflatex</code>, biber and other programs needed by the template.</li>
  <li><strong>Biber for bibliography management.</strong> The template defaults to using biblatex with the biber backend. Biber is part of TeX Live and will run automatically if configured in LaTeX Workshop.</li>
</ul>

<h2 id="1-acquire-the-template">1. Acquire the template</h2>

<p>Download it here: <a href="https://mirrors.ctan.org/macros/latex/contrib/mitthesis.zip">https://mirrors.ctan.org/macros/latex/contrib/mitthesis.zip</a>.</p>

<p class="notice">Comprehensive TeX Archive Network (CTAN) is a network of websites that serve as the central repository for TeX-related materials and software; it’s legit and referenced by the official  <a href="https://web.mit.edu/thesis/tex/">MIT thesis LaTeX template</a> website.</p>

<p>Extract (i.e., “unzip”) the archive to a convenient location—ideally the folder where you plan to write your thesis.</p>

<p><img src="/images/2025-08-04-Tutorial-MIT-Thesis-LaTeX/mitthesis-unzipped.png" alt="Example view of the unzipped `mitthesis`" /></p>

<h3 id="understand-the-contents">Understand the contents</h3>

<p>It’s always recommended to read the included <code class="language-plaintext highlighter-rouge">README.md</code>. This file lists the archive contents, notably a class file and a MIT-thesis-template folder containing everything you need to start writing:</p>

<table>
  <thead>
    <tr>
      <th>File/Folder</th>
      <th>Purpose</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">mitthesis.cls</code></td>
      <td>Core LaTeX class implementing MIT formatting</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">MIT-thesis-template/MIT-Thesis.tex</code></td>
      <td>Main LaTeX file for your thesis</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">abstract.tex</code>, <code class="language-plaintext highlighter-rouge">acknowledgments.tex</code>, <code class="language-plaintext highlighter-rouge">biosketch.tex</code></td>
      <td>Files where you insert your abstract, acknowledgments and optional biosketch</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">chapter1.tex</code>, <code class="language-plaintext highlighter-rouge">chapter...</code></td>
      <td>Sample chapters to use as templates</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">\Reader{...}</code> commands / <code class="language-plaintext highlighter-rouge">committee_members.tex</code></td>
      <td>In v1.21 and newer, <code class="language-plaintext highlighter-rouge">\Reader{...}</code> commands automatically generate the thesis committee page; older packages may include a separate <code class="language-plaintext highlighter-rouge">committee_members.tex</code> sample</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">appendixa.tex</code>, <code class="language-plaintext highlighter-rouge">appendixb.tex</code></td>
      <td>Sample appendices showing code listing and long tables</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">mitthesis-sample.bib</code></td>
      <td>Sample bibliography file with many entries</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">mitthesis-style.css</code></td>
      <td>Optional CSS embedded when tagged PDF is in use</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">mydesign.tex</code></td>
      <td>Optional file where you can load packages to customise colours, margins or caption styles</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">fontsets/</code></td>
      <td>Subdirectory containing optional font definitions</td>
    </tr>
  </tbody>
</table>

<p class="notice">Recent CTAN downloads include <code class="language-plaintext highlighter-rouge">MIT-Thesis.pdf</code> in the outer <code class="language-plaintext highlighter-rouge">mitthesis</code> folder as a sample document. If your download does not show it, not a big deal because we will be able to create it later.</p>

<p>Additionally, the <code class="language-plaintext highlighter-rouge">mitthesis-doc</code> directory contains detailed PDF documentation, and the examples directory provides sample theses showcasing different font options.</p>

<p>After extraction, keep the directory structure intact; LaTeX will look for chapter files relative to the main file. You can rename the outer folder to reflect your project’s name.</p>

<h3 id="class-file-location-update">Class file location update</h3>

<p>Update on May 4, 2026: the current <a href="https://ctan.org/tex-archive/macros/latex/contrib/mitthesis">CTAN package listing</a> shows <code class="language-plaintext highlighter-rouge">mitthesis.cls</code> in the outer <code class="language-plaintext highlighter-rouge">mitthesis</code> folder, while <code class="language-plaintext highlighter-rouge">MIT-thesis-template</code> is the folder with the files you edit. The <a href="https://mirrors.ctan.org/macros/latex/contrib/mitthesis/mitthesis-doc/mitthesis-doc.pdf">official documentation</a> says to copy <code class="language-plaintext highlighter-rouge">MIT-thesis-template</code> onto your system; if the current <code class="language-plaintext highlighter-rouge">mitthesis.cls</code> is already installed in TeX Live, you are all set, and if not, copy <code class="language-plaintext highlighter-rouge">mitthesis.cls</code> into your working directory.</p>

<p>In practice, with VS Code + LaTeX Workshop, the least surprising setup is:</p>

<ol>
  <li>Open <code class="language-plaintext highlighter-rouge">MIT-thesis-template</code> as the VS Code workspace.</li>
  <li>If the first build fails, copy <code class="language-plaintext highlighter-rouge">../mitthesis.cls</code> into <code class="language-plaintext highlighter-rouge">MIT-thesis-template</code>.</li>
  <li>Rebuild.</li>
</ol>

<p class="notice">This matters because TeX Live may already have an older <code class="language-plaintext highlighter-rouge">mitthesis.cls</code> installed. For example, a TeX Live 2025 installation can still have mitthesis v1.20, while the current CTAN zip is v1.22. Mixing v1.22 template files with a v1.20 installed class can trigger errors such as <code class="language-plaintext highlighter-rouge">Undefined control sequence \CiteNolink</code>. Copying the outer class file into <code class="language-plaintext highlighter-rouge">MIT-thesis-template</code> makes the project use the matching class version.</p>

<h2 id="2-opening-the-project-in-vs-code">2. Opening the project in VS Code</h2>

<p>Launch <strong>VS Code</strong>, then choose <strong>File → Open Folder…</strong> and select the <code class="language-plaintext highlighter-rouge">MIT-thesis-template</code> folder (not the outer <code class="language-plaintext highlighter-rouge">mitthesis</code> folder!).</p>

<p>VS Code will treat this folder as the workspace.</p>

<p>Open the <code class="language-plaintext highlighter-rouge">MIT-Thesis.tex</code> file as it is the root document.</p>

<p><img src="/images/2025-08-04-Tutorial-MIT-Thesis-LaTeX/MIT-thesis-template-VSCode.png" alt="MIT-thesis-template folder opened in VS Code" /></p>

<h2 id="3-trigger-the-build">3. Trigger the build</h2>

<p>You should see a “TEX” badge from the LaTeX Workshop extension appearing in the leftmost panel of your window.</p>

<p>Trigger a file save using standard shortcuts:</p>
<ul>
  <li>macOS: <code class="language-plaintext highlighter-rouge">command</code> + <code class="language-plaintext highlighter-rouge">S</code></li>
  <li>Windows: <code class="language-plaintext highlighter-rouge">Ctrl</code> + <code class="language-plaintext highlighter-rouge">S</code></li>
</ul>

<p>As explained in the <a href="/posts/2025/06/LaTeX-VSCode/">previous tutorial</a>, saving the file automatically triggers the LaTeX Workshop build process. You’ll notice a spinning “🔄 Build” icon in the bottom-left corner, indicating compilation in progress. This may take a minute.</p>

<p class="notice">During compilation, local setup variations can cause errors. If you encounter an issue, please let me know so I can include quick fixes here. I also recommend troubleshooting with your preferred GenAI tool—my go-to choice is GitHub Copilot.</p>

<p><img src="/images/2025-08-04-Tutorial-MIT-Thesis-LaTeX/MIT-thesis-first-compiled-VSCode.png" alt="Default MIT-Thesis compiled with resulting PDF displayed" /></p>

<p>Scroll through the generated PDF file, <code class="language-plaintext highlighter-rouge">MIT-Thesis.pdf</code>; it should closely match the <a href="http://mirrors.ctan.org/macros/latex/contrib/mitthesis/examples/font_samples/Lmodern_sample.pdf">official example PDF</a>.</p>

<p>Congratulations, your MIT Thesis LaTeX template is now ready to use.</p>

<h2 id="4-familiarize-yourself-with-the-template">4. Familiarize yourself with the template</h2>

<p>The best way to get comfortable is to explore its structure – no shortcuts here.</p>

<p><strong>Please read the <a href="http://mirrors.ctan.org/macros/latex/contrib/mitthesis/mitthesis-doc/mitthesis-doc.pdf">official documentation</a> – again, no shortcut</strong>.</p>

<p>One quick tip: skim through <code class="language-plaintext highlighter-rouge">MIT-Thesis.tex</code>. Notice the multiple <code class="language-plaintext highlighter-rouge">\input{}</code> statements pulling in separate files for different thesis sections, such as <code class="language-plaintext highlighter-rouge">abstract.tex</code> and <code class="language-plaintext highlighter-rouge">chapter1.tex</code>. Edit these files slightly, rebuild the PDF, and observe the results – action learning at its finest!</p>

<p>If you are a student in MIT’s <a href="https://lgo.mit.edu/">Leaders for Global Operations (LGO)</a> dual-degree program, continue to the next section. Otherwise, you’re all set.</p>

<h2 id="5-lgo-thesis-tweaks">5. LGO Thesis tweaks</h2>

<p>The official package includes an example relevant for an LGO thesis: <a href="https://mirrors.ctan.org/macros/latex/contrib/mitthesis/examples/cover_page_samples/latex_sources/One_author_two_degrees.tex"><code class="language-plaintext highlighter-rouge">One_author_two_degrees.tex</code></a>. I’ve adapted this example to simplify things.</p>

<ol>
  <li>In your <code class="language-plaintext highlighter-rouge">MIT-Thesis.tex</code>, locate:
    <div class="language-tex highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nt">\begin{document}</span>
 <span class="c">%%% edit the following commands to match your thesis %%%%%%%%%%</span>
</code></pre></div>    </div>
  </li>
  <li>Replace the content from <code class="language-plaintext highlighter-rouge">\title{...}</code> to <code class="language-plaintext highlighter-rouge">\ThesisDate{...}</code> (inclusive) with this:
    <div class="language-tex highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">\title</span><span class="p">{</span>Simplify and Accelerate: An Awesome Dual-Degree Thesis Title<span class="p">}</span>

 <span class="c">% \Author{Author full name}{Author department}[Author's first PREVIOUS degree][Author's second PREVIOUS degree][...</span>
 <span class="c">% Note that third, fourth, fifth, and sixth arguments are optional [] and may be omitted</span>

 <span class="k">\Author</span><span class="p">{</span>LGO Student Name<span class="p">}</span>
 <span class="p">{</span><span class="k">\shortstack</span><span class="na">[l]</span><span class="p">{</span>MIT Sloan School of Management and<span class="k">\\</span>
 Department of Electrical Engineering <span class="k">\&amp;</span> Computer Science<span class="p">}}</span>
 [B.S. Previous Degree, Previous College, 2018]

 <span class="c">% Use once for each degree fulfilled by thesis</span>
 <span class="c">% For two degrees from one department, leave the department argument blank for the second degree {}.</span>
 <span class="k">\Degree</span><span class="p">{</span>Master of Business Administration<span class="p">}{</span>MIT Sloan School of Management<span class="p">}</span>
 <span class="k">\Degree</span><span class="p">{</span>Master of Science in Electrical Engineering and Computer Science<span class="p">}{</span>Department of Electrical Engineering and Computer Science<span class="p">}</span>

 <span class="c">% If there is more than one supervisor, use the \Supervisor command for each.</span>
 <span class="k">\Supervisor</span><span class="p">{</span>Sloan Advisor Name<span class="p">}{</span>Professor of Operation Management<span class="p">}</span>
 <span class="k">\Supervisor</span><span class="p">{</span>Engineering Advisor Name<span class="p">}{</span>Professor of Electrical Engineering and Computer Science<span class="p">}</span>

 <span class="c">% Professor who formally accepts theses for your department (e.g., the Graduate Officer, Professor Sméagol,...)</span>
 <span class="c">% If you need to reduce vertical space, put the acceptor title in the second argument and leave the third blank {}.</span>
 <span class="k">\Acceptor</span><span class="p">{</span>Engineering Acceptor Name<span class="p">}{</span>Professor of Electrical Engineering and Computer Science<span class="p">}{</span>Graduate Officer, Department of Electrical Engineering and Computer Science<span class="p">}</span>
 <span class="k">\Acceptor</span><span class="p">{</span>Sloan Acceptor Name<span class="p">}{</span>Assistant Dean<span class="p">}{</span>MBA Program, Sloan School of Management<span class="p">}</span>

 <span class="c">% Keep the signature block at normal size unless the title page itself needs more vertical space.</span>
 <span class="k">\SignatureBlockSize</span><span class="p">{</span><span class="k">\normalsize</span><span class="p">}</span>
 <span class="k">\AuthorNameSize</span><span class="p">{</span><span class="k">\normalsize</span><span class="p">}</span>

 <span class="c">% Usage: \DegreeDate{Month}{year}</span>
 <span class="k">\DegreeDate</span><span class="p">{</span>May<span class="p">}{</span>2026<span class="p">}</span>

 <span class="c">% Date that final thesis is submitted to department</span>
 <span class="k">\ThesisDate</span><span class="p">{</span>May 8, 2026<span class="p">}</span>
</code></pre></div>    </div>
  </li>
</ol>

<p>The important part for long department names is the <code class="language-plaintext highlighter-rouge">\shortstack[l]{...\\...}</code> wrapper around the second argument to <code class="language-plaintext highlighter-rouge">\Author</code>. It forces the affiliation to appear on two left-aligned lines, avoiding awkward automatic wrapping while keeping the title-page signature block at normal size.</p>

<p>Per “Thesis Review and Submission Process”, <em>LGO Handbook</em> (accessed on August 3, 2025), we also need to include:</p>

<blockquote>
  <p>“IN CONJUNCTION WITH THE LEADERS FOR GLOBAL OPERATIONS PROGRAM AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY”.</p>
</blockquote>

<p>To achieve this, we have to have our own version of the <code class="language-plaintext highlighter-rouge">mitthesis.cls</code> (remember it from the “outer” directory?)</p>

<ol>
  <li>Copy the <code class="language-plaintext highlighter-rouge">mitthesis.cls</code> file to your project workspace (i.e. same folder as your <code class="language-plaintext highlighter-rouge">MIT-Thesis.tex</code>)</li>
  <li>In your <strong>copied</strong> <code class="language-plaintext highlighter-rouge">mitthesis.cls</code> file, find the line that goes <code class="language-plaintext highlighter-rouge">at~the\par</code>. In earlier versions, it appears right after <code class="language-plaintext highlighter-rouge">\__degree_block:</code>. In v1.21 and newer, it appears right after <code class="language-plaintext highlighter-rouge">\__mitthesis_degree_block:</code>.</li>
  <li>Insert the following line between the degree-block line and <code class="language-plaintext highlighter-rouge">at~the\par</code>:
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> in~conjunction~with~the~Leaders~for~Global~Operations~program\par
</code></pre></div>    </div>
    <p>Remember to maintain the same leading indentation as the two reference lines.</p>
  </li>
  <li>On newer versions, we end up with something like:
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> \__mitthesis_degree_block:
 in~conjunction~with~the~Leaders~for~Global~Operations~program\par
 at~the\par
</code></pre></div>    </div>
    <p>On older versions, the first line may instead be <code class="language-plaintext highlighter-rouge">\__degree_block:</code>. Leading indentation omitted to save some spaces here.</p>
  </li>
</ol>

<p>By doing so, our LaTeX project build should pick up our copied and edited <code class="language-plaintext highlighter-rouge">mitthesis.cls</code> file instead of the default <code class="language-plaintext highlighter-rouge">mitthesis.cls</code> provided by the template package.</p>

<p class="notice"><strong>This is a hot fix.</strong> When the official template package is updated and we thought we’ve pulled the latest updates from CTAN, those new changes won’t show up in our thesis workspace unless we merge the new version of <code class="language-plaintext highlighter-rouge">mitthesis.cls</code> into our local copy of <code class="language-plaintext highlighter-rouge">mitthesis.cls</code> while keeping the “in conjunction…” line inserted. My take: this <strong>risk is minimal</strong> since your thesis work is short-term and static after submission.</p>

<p>Rebuild your LaTeX project and you should see a cover page like this:</p>

<p class="notice"><img src="/images/2025-08-04-Tutorial-MIT-Thesis-LaTeX/LGO-Thesis-Cover-Example.jpg" alt="An example of LGO thesis cover page rendered" />
<strong>Note</strong> that the degree date should be <strong>May</strong> at least for the year 2026. Please confirm with your department staff (whoever reviews cover page) each year.</p>

<h3 id="update-long-department-names">Update: long department names</h3>

<p>Update on May 3, 2026: previously I used <code class="language-plaintext highlighter-rouge">\SignatureBlockSize{\footnotesize}</code> to tame the long Sloan + EECS affiliation line. A cleaner fix is to wrap the department argument itself with <code class="language-plaintext highlighter-rouge">\shortstack[l]{...\\...}</code>, which inserts the break exactly between the two departments.</p>

<p class="notice">If you find other required tweaks, please let me know!</p>

<hr />

<p>Good luck with your thesis! Good luck to me too…</p>]]></content><author><name>Junru Ren 任俊儒</name><email>junru@computer.org</email></author><category term="tutorial" /><category term="latex" /><category term="vscode" /><category term="mit" /><summary type="html"><![CDATA[Let’s set up the MIT thesis LaTeX template locally!]]></summary></entry><entry><title type="html">Tutorial: Use LaTeX Locally with VS Code</title><link href="https://junruren.com/posts/2025/06/LaTeX-VSCode/" rel="alternate" type="text/html" title="Tutorial: Use LaTeX Locally with VS Code" /><published>2025-06-29T00:00:00+00:00</published><updated>2025-06-29T00:00:00+00:00</updated><id>https://junruren.com/posts/2025/06/Tutorial-LaTeX-VSCode</id><content type="html" xml:base="https://junruren.com/posts/2025/06/LaTeX-VSCode/"><![CDATA[<p>This post is a quick guide to setting up LaTeX on your own computer, so you can write papers, resumes, or <a href="https://github.com/junruren/MIT-Classes">dazzling “class notes”</a> using LaTeX <strong>locally</strong>—without relying on internet connectivity. In other words, you won’t be tied to <a href="https://www.overleaf.com/">Overleaf</a>: you can bring your laptop anywhere and keep working on your next big ideas.</p>

<h2 id="why-not-overleaf">Why not Overleaf?</h2>

<p>First off, let me say that <a href="https://www.overleaf.com/">Overleaf</a> is awesome! It’s a rather complete product, crafted for all the heavy LaTeX writers, with many desirable features, a huge library of <a href="https://www.overleaf.com/latex/templates">templates</a>, and a great collection of <a href="https://www.overleaf.com/learn">LaTeX guides</a>.</p>

<p><img src="/images/2025-06-29-Tutorial-LaTeX-VSCode/Cover_by_ChatGPT.png" alt="Cover Photo by ChatGPT" /></p>
<blockquote>
  <p>Generated by GPT-4o. Prompt: <em>A playful illustration showing a person sitting on an airplane, trying to work on a laptop with the Overleaf logo on the screen, but looking frustrated because there’s no Wi-Fi signal. Outside the window, fluffy clouds and a blue sky. In the background, a thought bubble shows the same person happily working on their laptop (with VS Code logo) in a cozy home office, surrounded by books and coffee, with a checkmark and a glowing PDF icon. The style is colorful, lighthearted, and cartoonish.</em></p>
</blockquote>

<p>However, relying solely on a cloud-based service like Overleaf comes with a few drawbacks:</p>

<h3 id="1-server-downtime">1. Server downtime</h3>

<p>Just like when your favorite social media or streaming site occasionally refuses to load, Overleaf is subject to outages. See <a href="https://status.overleaf.com/history">Overleaf’s status page</a> for yourself.</p>

<p>One of my MIT LGO alumni friends mentioned that a few years ago, while writing their MIT thesis, an Overleaf outage definitely made for some uneasy times.</p>

<p>At the time of writing this blog, the most recent “blockbuster” incident was Overleaf being down on May 14, just before the <a href="https://x.com/DianboLiu/status/1922544766849257851">NeurIPS manuscript deadline</a>. You can see how people reacted under <a href="https://x.com/overleaf/status/1922576431130759359">Overleaf’s X post about this outage</a>.</p>

<h3 id="2-difficult-to-write-without-internet-eg-onboard-a-flight">2. Difficult to write without internet (e.g., onboard a flight)</h3>

<p>Ideas come and go. Who doesn’t like the idea of being able to jot them down and immediately see the LaTeX render? Many people seem to be more productive in the air, but are we going to pay for that in-flight Wi-Fi (assuming Wi-Fi is even available onboard)?</p>

<p>This is where a local setup shines. While Overleaf is truly a crucial product in academia and beyond (and I still like it!), I also want to be immune to these risks. Here’s how you can de-risk by having a local setup.</p>

<hr />

<p class="notice"><strong>Disclaimer:</strong> I am a heavy macOS user, so for the rest of this guide, I’ll do my best to mention what might be different on Windows. (If you use Unix, this guide <em>may</em> already not be needed for you!)</p>

<h2 id="tldr">TL;DR</h2>

<p>My local setup is: VS Code + the “LaTeX Workshop” extension for VS Code + TeX Live.</p>

<p>This blog is aggressively simplified from the much more thorough <a href="https://github.com/James-Yu/LaTeX-Workshop/wiki/Install">installation guide of the LaTeX Workshop extension</a>. I envisioned this blog to be very lightweight and introductory (i.e., not loaded with info that would please “power users”), but I’ll iterate and try to find the right balance between too basic and too hardcore. Your feedback is greatly appreciated!</p>

<h2 id="1-install-tex-live">1. Install TeX Live</h2>

<p><a href="https://www.tug.org/texlive/">TeX Live</a> is a LaTeX distribution compatible with and recommended by the LaTeX Workshop extension.</p>

<ul>
  <li>macOS: <a href="https://www.tug.org/mactex/mactex-download.html">https://www.tug.org/mactex/mactex-download.html</a></li>
  <li>Windows: <a href="https://www.tug.org/texlive/windows.html#install">https://www.tug.org/texlive/windows.html#install</a></li>
</ul>

<h2 id="2-install-vs-code">2. Install VS Code</h2>

<p>Download it here: <a href="https://code.visualstudio.com/download">https://code.visualstudio.com/download</a></p>

<h2 id="3-install-the-latex-workshop-extension-for-vs-code">3. Install the LaTeX Workshop Extension for VS Code</h2>

<p>Open VS Code and look up the “LaTeX Workshop” extension in the Extension Marketplace:</p>

<ol>
  <li>Bring up the Extensions view by clicking on the Extensions icon in the Activity Bar on the side of VS Code.</li>
  <li>In the search bar, type “LaTeX Workshop” and look for the one authored by James Yu.</li>
  <li>Click “Install”.</li>
</ol>

<p><img src="/images/2025-06-29-Tutorial-LaTeX-VSCode/LaTeXWorkshop-Extension-VSCode-Screenshot.png" alt="Screenshot of LaTeX Workshop shown in VS Code's Extension Marketplace" /></p>

<h2 id="4-test-your-first-locally-built-pdf">4. Test: Your first locally built PDF</h2>

<p>Let’s try it out! In VS Code, create a new file. I named mine <code class="language-plaintext highlighter-rouge">hello.tex</code> and typed in the following content (<a href="https://www.overleaf.com/learn/latex/Learn_LaTeX_in_30_minutes#Writing_your_first_piece_of_LaTeX">source: Overleaf guide</a>—see, I told you Overleaf is awesome):</p>

<div class="language-tex highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">\documentclass</span><span class="p">{</span>article<span class="p">}</span>
<span class="nt">\begin{document}</span>
First document. This is a simple example, with no 
extra parameters or packages included.
<span class="nt">\end{document}</span>
</code></pre></div></div>

<p><img src="/images/2025-06-29-Tutorial-LaTeX-VSCode/VSCode-First-LaTeX-Before-Build.png" alt="Screenshot of a simple tex file created on VS Code before saving" /></p>

<p>Now, save your file in VS Code:</p>
<ul>
  <li>macOS: <code class="language-plaintext highlighter-rouge">command</code> + <code class="language-plaintext highlighter-rouge">S</code></li>
  <li>Windows: <code class="language-plaintext highlighter-rouge">Ctrl</code> + <code class="language-plaintext highlighter-rouge">S</code></li>
</ul>

<p>By default, the LaTeX Workshop extension has a convenient feature turned on: whenever you save or even change the <code class="language-plaintext highlighter-rouge">.tex</code> file, compilation of your LaTeX project is triggered. If nothing seems to happen, try clicking the green triangle icon at the top right corner.</p>

<p class="notice">Like anything in the developer world, the LaTeX Workshop extension comes with many customization options. For example, the auto build trigger is available in the extension’s settings. <img src="/images/2025-06-29-Tutorial-LaTeX-VSCode/LaTeXWorkshop-AutoBuild.png" alt="Screenshot of LaTeX Workshop Setting on Auto Build" /> For simplicity, I won’t go into detail here.</p>

<p>You should now see something similar to the following screenshot:</p>

<p><img src="/images/2025-06-29-Tutorial-LaTeX-VSCode/VSCode-First-LaTeX-After-Build.png" alt="Screenshot of a simple tex file created on VS Code after saving" /></p>

<p>In the same folder as your <code class="language-plaintext highlighter-rouge">hello.tex</code>, you’ll see:</p>

<ul>
  <li>a <code class="language-plaintext highlighter-rouge">hello.pdf</code> (what we really care about), and</li>
  <li>a bunch of auxiliary files; ignore them for now (again, I won’t go into detail here).</li>
</ul>

<p>You can click on <code class="language-plaintext highlighter-rouge">hello.pdf</code> and expect to see exactly what our simple <code class="language-plaintext highlighter-rouge">.tex</code> file should render into.</p>

<p>A very convenient feature is to show the PDF file side-by-side with your <code class="language-plaintext highlighter-rouge">.tex</code> file. Find the green triangle Build button at the top right corner; right next to it is another icon with a little magnifier over two rectangles—click it.</p>

<p><img src="/images/2025-06-29-Tutorial-LaTeX-VSCode/LaTeX-Compiled-PDF-Example.png" alt="Screenshot of the simple tex file with its compiled PDF opened side-by-side" /></p>

<p>Congratulations! You now have a “barebones” setup for working on LaTeX documents locally. Convince yourself by disconnecting your computer from the internet, making some random edits to your <code class="language-plaintext highlighter-rouge">.tex</code> file, and saving your changes again.</p>

<h3 id="troubleshooting">Troubleshooting</h3>

<p>I initially came across <code class="language-plaintext highlighter-rouge">LaTeX fatal error on PID undefined. Error: spawn latexmk ENOENT</code>. My solution was simply to restart VS Code.</p>

<p>Please let me know if you encounter any other issues!</p>

<hr />

<h2 id="coming-soon">Coming soon…</h2>

<p>I plan to write a mini-series of blogs on LaTeX, specifically in the context of writing a thesis at MIT. In the future, I plan to cover:</p>

<ul>
  <li>How to use <code class="language-plaintext highlighter-rouge">git</code> to track your progress and back up with GitHub</li>
  <li><a href="/posts/2025/08/MIT-Thesis-LaTeX/">How to use the MIT thesis template with this local setup</a></li>
</ul>

<p>Meanwhile, please let me know if you have any suggestions for this tutorial or future topics.</p>]]></content><author><name>Junru Ren 任俊儒</name><email>junru@computer.org</email></author><category term="tutorial" /><category term="latex" /><category term="vscode" /><summary type="html"><![CDATA[This post is a quick guide to setting up LaTeX on your own computer, so you can write papers, resumes, or dazzling “class notes” using LaTeX locally—without relying on internet connectivity. In other words, you won’t be tied to Overleaf: you can bring your laptop anywhere and keep working on your next big ideas.]]></summary></entry><entry><title type="html">6.8300 Final Project: Taming CLIP’s Captioning Bias: A COCO-Driven Analysis and Permutation Ensemble</title><link href="https://junruren.com/posts/2025/05/6.8300-final/" rel="alternate" type="text/html" title="6.8300 Final Project: Taming CLIP’s Captioning Bias: A COCO-Driven Analysis and Permutation Ensemble" /><published>2025-05-13T00:00:00+00:00</published><updated>2025-05-13T00:00:00+00:00</updated><id>https://junruren.com/posts/2025/05/6.8300-Final-Project-Blog</id><content type="html" xml:base="https://junruren.com/posts/2025/05/6.8300-final/"><![CDATA[<p><strong>Abstract</strong>: <em>Vision-language models like CLIP struggle with multi-object scenes, often favoring prominent objects or those mentioned first in captions. Using real-world COCO images, we show that CLIP’s caption-matching accuracy drops from 91.23% to 87.45% when object order is reversed. To address this, we explore a post-hoc mitigation: a permutation ensemble that averages scores across all object orders, boosting robustness and recovering accuracy to 90.04%. Our findings reveal persistent order biases and offer a simple, effective strategy to improve CLIP’s reliability in complex scenes.</em></p>

<p><img src="/images/2025-05-13-6.8300-Final-Project-Blog/Cover_by_ChatGPT.png" alt="Cover Photo by ChatGPT" /></p>

<blockquote>
  <p>Generated by GPT-4o. Prompt: <em>Please create a widescreen cover image illustration for my blog post. Please illustrate a “humanized” CLIP model being unsure about which caption best describes the attached image. The captions are: “a pizza and a dog and a dining table”, “a dining table and a dog and a pizza”，”a pizza and a dog and a cell phone”. Please include the provided image along with three captions in the generated image!</em></p>
</blockquote>

<hr />

<h2 id="introduction">Introduction</h2>

<p>Accurate image captioning, a key challenge at the intersection of computer vision and natural language processing, is fundamental for enabling machines to “see” and describe visual content. This capability underpins diverse applications, from accessibility tools that interpret images for visually impaired users to sophisticated content-based image retrieval and the scene understanding required by autonomous systems. Real-world photographs, the primary input for many computer vision tasks, typically depict complex scenes with multiple objects of varying sizes, categories, and semantic importance. These multi-object scenarios present a significant hurdle for <strong>vision-language models</strong>, which must accurately identify, localize (implicitly or explicitly), and then articulate all relevant entities in a coherent textual description. Datasets like Microsoft COCO <a href="https://arxiv.org/abs/1405.0312">Lin et al., 2014</a> are crucial benchmarks in computer vision as they exemplify this complexity by providing richly annotated, multi-object scenes. Understanding how advanced models such as CLIP <a href="https://arxiv.org/abs/2103.00020">Radford et al., 2021</a>—which learns joint embeddings from visual and textual data—handle these intricate visual interactions is therefore critical for improving the robustness, fairness, and overall performance of computer vision systems that aim to bridge the gap between pixels and semantics.</p>

<h2 id="literature-review">Literature Review</h2>

<h3 id="biases-in-multi-object-visionlanguage-models">Biases in Multi-Object Vision–Language Models</h3>

<p>Recent research has highlighted that vision–language models, especially CLIP (<a href="https://arxiv.org/abs/2103.00020">Radford et al., 2021</a>), exhibit notable biases when processing images containing multiple objects. These biases manifest primarily through the models’ tendencies to prioritize larger, visually dominant objects and objects mentioned earlier in captions (<a href="https://arxiv.org/abs/2502.19828">Abbasi et al., 2025</a>). Specifically, Abbasi et al. constructed controlled benchmarks (SimCO, CompCO) to analyze how varying object size and caption order affect CLIP’s image–text matching accuracy. Their results demonstrated that CLIP disproportionately focuses on larger objects visually, and textually prioritizes the first-mentioned object, significantly impacting multi-object captioning tasks.</p>

<p>Kamath et al. (2023) similarly explored CLIP’s limitations, revealing that its text encoder bottlenecks compositional information, causing failures in accurately representing object relationships, attribute–object bindings, and counts (<a href="https://arxiv.org/abs/2210.01936">Kamath et al., 2023</a>). For instance, CLIP struggled to differentiate semantically nuanced phrases like “a red square and a blue circle” from “a blue square and a red circle,” indicating weak compositional grounding.</p>

<p>Further studies on quantity biases reinforce these findings. Zhang et al. (2024) found that CLIP embeddings inadequately capture numerical object counts, leading to downstream errors such as incorrect object numbers in image generation (<a href="https://arxiv.org/abs/2310.01845">Zhang et al., 2024</a>). These biases collectively suggest that CLIP’s current embedding strategies offer limited fidelity for multi-object representation and grounding, emphasizing dominant objects while neglecting detailed compositional relationships.</p>

<h3 id="evaluation-methods-and-datasets">Evaluation Methods and Datasets</h3>

<p>To systematically study these biases, researchers have developed specialized datasets and analytical methods. <a href="https://huggingface.co/datasets/clip-oscope/simco-comco">Abbasi et al.’s (2025) SimCO and CompCO datasets</a> provide controlled synthetic and real-world image–caption pairs to precisely manipulate object prominence and mention order, making explicit the biases inherent in CLIP’s visual and textual encoders.</p>

<p>Other benchmarks like <strong>Winoground</strong> (Thrush et al., 2022) and <strong>CREPE</strong> (Yuksekgonul et al., 2023) specifically target relational and compositional understanding, exposing how small lexical or structural changes (e.g., swapping object positions in captions) significantly disrupt CLIP’s accuracy. Complementing these datasets, Chen et al. (2023) introduced <strong>gScoreCAM</strong>, a visualization technique that generates attention heatmaps highlighting CLIP’s focal regions within images (<a href="https://arxiv.org/abs/2302.05592">Chen et al., 2023</a>). Through such visualizations, researchers confirmed that CLIP frequently fixates on the most visually prominent object, further validating previous analytical findings and providing intuitive diagnostics for compositional failures.</p>

<h3 id="mitigation-techniques">Mitigation Techniques</h3>

<p>To address these compositional biases, researchers have proposed both architectural enhancements and novel training strategies. One notable approach is the integration of explicit object detection modules. For instance, <strong>MDETR</strong> (<a href="https://arxiv.org/abs/2104.12763">Kamath et al., 2021</a>) and <strong>GLIP</strong> (<a href="https://arxiv.org/abs/2112.03857">Li et al., 2022</a>) incorporate transformers trained to explicitly localize objects based on textual queries, thus promoting stronger grounding and compositional reasoning capabilities. These models substantially outperform traditional CLIP in tasks requiring precise multi-object correspondence.</p>

<p>Alternatively, Assouel et al. (2024) introduced <strong>OC-CLIP</strong>, an object-centric extension of CLIP designed specifically to enhance multi-object grounding. OC-CLIP utilizes slot-based object representations and graph-based scene parsing to achieve explicit bindings between image regions and caption components, significantly improving performance on compositional retrieval tasks (<a href="https://arxiv.org/abs/2311.11093">Assouel et al., 2024</a>).</p>

<p>Training-centric solutions have also been explored, including contrastive fine-tuning with carefully constructed hard negatives, and augmenting datasets with synthetic compositional examples. While effective in improving benchmark scores, these data-driven approaches address the symptoms of CLIP’s biases rather than the fundamental architectural limitations, underscoring the importance of combining architectural and data-centric solutions for robust multi-object vision–language modeling.</p>

<h3 id="our-take">Our take</h3>

<p>In the scope of a class term project, we’d like to probe the issues and ideate a potential post-hoc fix.</p>

<hr />

<h2 id="dataset---microsoft-coco">Dataset - Microsoft COCO</h2>

<p>For our analysis, we rely on the <a href="https://cocodataset.org/">Microsoft Common Objects in Context (COCO) dataset</a>—a widely used benchmark in computer vision research. COCO offers richly annotated images that capture complex everyday scenes, typically containing multiple objects of diverse categories, scales, and spatial relationships. This makes it well-suited for studying how vision-language models handle multi-object representations.</p>

<p>Each image in COCO includes multiple human-written captions and object-level annotations, including precise bounding boxes, object categories, and segmentation masks.</p>

<p>The dataset provides structured annotation fields such as:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">image_id</code>: Unique identifier for each image</li>
  <li><code class="language-plaintext highlighter-rouge">bbox</code>: A bounding box in <code class="language-plaintext highlighter-rouge">[x, y, width, height]</code> format</li>
  <li><code class="language-plaintext highlighter-rouge">category_id</code>: Object class index (mapped to names via the category list)</li>
</ul>

<h3 id="defining-object-prominence">Defining Object Prominence</h3>

<p>In our study, we define the <em>prominence</em> of an object in an image as the ratio between the area of the object’s annotated bounding box and the total area of the image. This simple yet effective geometric measure quantifies an object’s spatial dominance within the visual scene. Prominence serves as a proxy for visual salience, under the assumption that larger objects are more likely to be visually or semantically prioritized by both human observers and vision-language models.</p>

<p>Formally, for an object with annotation <code class="language-plaintext highlighter-rouge">[x, y, width, height]</code> in the COCO dataset, and an image of dimensions <code class="language-plaintext highlighter-rouge">image_width</code> and <code class="language-plaintext highlighter-rouge">image_height</code>, prominence is calculated as:</p>

\[\begin{aligned}
\text{Prominence} = \frac{\text{width} \times \text{height}}{\text{image\_width} \times \text{image\_height}}
\end{aligned}\]

<p>Granted, an annotated bounding box area is not equivalent to the actual object’s size because not every object in an image appears in a perfect rectangle. While the COCO dataset does provide more granular segmentation annotations, we use the bounding box area as a simple proxy for the actual prominence of an object in a given image.</p>

<p>By quantifying prominence in this way, we can analyze whether models like CLIP are biased toward larger objects in multi-object scenes, especially when generating or scoring captions that describe such images.</p>

<h3 id="coco-categories">COCO Categories</h3>

<p><img src="/images/2025-05-13-6.8300-Final-Project-Blog/COCO_Categories.png" alt="COCO Categories Visualized" /></p>

<p>There are 80 annotated categories in the COCO <code class="language-plaintext highlighter-rouge">train2017</code> split. The diagram illustrates the hierarchical grouping of these object classes (shown in green) under broader super-categories (shown in blue), such as <code class="language-plaintext highlighter-rouge">animal</code>, <code class="language-plaintext highlighter-rouge">vehicle</code>, <code class="language-plaintext highlighter-rouge">kitchen</code>, and <code class="language-plaintext highlighter-rouge">electronic</code>. Each edge connects an object category to its corresponding super-category as defined by the COCO dataset’s metadata. This visualization highlights the diversity and complexity of the dataset.</p>

<h3 id="data-selection">Data Selection</h3>

<p>In this project, we selected <strong>502 images</strong> from the <strong>2017 training set</strong> (<code class="language-plaintext highlighter-rouge">train2017</code>).</p>

<p>Although the COCO <code class="language-plaintext highlighter-rouge">train2017</code> split contains 118K images, we had to select a subset based on the following criteria:</p>

<ol>
  <li><strong>Sufficient annotated objects per image:</strong> Our study focuses on multi-object scenarios, so images require a minimum number of objects. We required each image to have <strong>at least 3 annotated objects</strong>.</li>
  <li><strong>Sufficient prominence per object:</strong> After a brief exploration of the COCO dataset, we noticed some annotated objects can be very small. This rule aims to exclude instances where, for example, a tiny corner of a microwave is annotated as “microwave.” We required each annotated object to have <strong>at least 10% prominence</strong>.</li>
  <li><strong>Unique object categories per image:</strong> For example, constructing captions for images with multiple instances of the same object category (e.g., two “dogs”) would require disambiguation, which was outside the scope of this study. Therefore, we required all annotated objects in an image to belong to unique categories.</li>
</ol>

<table>
  <thead>
    <tr>
      <th>Filter Criterion</th>
      <th>Cumulative Images Remaining</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Initial <code class="language-plaintext highlighter-rouge">train2017</code> set</td>
      <td>118K</td>
    </tr>
    <tr>
      <td>&gt;= 3 objects per image</td>
      <td>81,982</td>
    </tr>
    <tr>
      <td>Each object &gt;= 10% prominence</td>
      <td>1,012</td>
    </tr>
    <tr>
      <td>All objects in unique categories</td>
      <td>502</td>
    </tr>
  </tbody>
</table>

<h2 id="captioning-method">Captioning Method</h2>

<p>For each of the 502 selected images, we employed the following captioning process:</p>

<ol>
  <li>Identify the <strong>top 3</strong> most prominent objects. E.g., <code class="language-plaintext highlighter-rouge">['airplane', 'truck', 'person']</code>. The order of these objects was varied in subsequent experiments.</li>
  <li>Generate a text caption by prefixing each object name with the appropriate English indefinite article (“a” or “an”) and conjoining them with “and”. E.g., <code class="language-plaintext highlighter-rouge">"an airplane and a truck and a person"</code>.</li>
</ol>

<h2 id="experiments">Experiments</h2>

<p>Inspired by the controlled multi-object bias study of (Abbasi et al., 2025), we evaluated whether the same prominence and order biases emerge when testing CLIP on <em>real-world</em>, high-resolution images from the COCO <code class="language-plaintext highlighter-rouge">train2017</code> dataset rather than on synthetic benchmarks like SimCO and CompCO. Whereas Abbasi et al. precisely manipulated object size and mention order in crafted scenes, we selected 502 COCO images containing at least three objects (each occupying ≥10% of the frame) and generated paired captions that vary the order of the top-3 objects or replace one. This real-data pipeline—defining “prominence” as the ratio of bounding-box area to image area and introducing a “random third object” for incorrect caption variants—allows us to probe CLIP’s biases in complex, natural settings using a simpler, accuracy-based metric.</p>

<p>Our results show a drop in matching accuracy from 91.23% (largest-first correct vs. random-third incorrect) to 87.45% when the correct caption is presented in smallest-first order, closely mirroring the performance degradation that (Abbasi et al., 2025) observed when swapping object mention order in synthetic scenes. Although the absolute magnitude of the drop is slightly attenuated—likely due to COCO’s richer visual context—this concordance suggests that CLIP’s order bias persists beyond controlled datasets and into natural, multi-object environments.</p>

<h3 id="1-correct-vs-incorrect-captions-with-largest-object-first">1. Correct vs. Incorrect Captions with Largest Object First</h3>

<p>We studied whether a less prominent object being misrepresented in a caption could mislead CLIP when scoring captions for the same image.</p>

<p>For each image, we constructed two captions:</p>

<ul>
  <li>A <strong>correct caption</strong> that mentions the top 3 most prominent objects in descending order of prominence (largest first).</li>
  <li>An <strong>incorrect caption</strong> that mentions the two most prominent objects correctly (in descending order of prominence) but replaces the third object with a randomly chosen category (ensuring it is unique from the other two) from the COCO dataset.</li>
</ul>

<p><img src="/images/2025-05-13-6.8300-Final-Project-Blog/000000150410.jpg" alt="COCO image 150410" /></p>

<p>For example, image <code class="language-plaintext highlighter-rouge">150410</code> (shown above) has the following three most prominent objects:</p>

<table>
  <thead>
    <tr>
      <th>Object</th>
      <th>Prominence</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">airplane</code></td>
      <td><code class="language-plaintext highlighter-rouge">0.252096262075527</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">truck</code></td>
      <td><code class="language-plaintext highlighter-rouge">0.2164387212748829</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">person</code></td>
      <td><code class="language-plaintext highlighter-rouge">0.10906725373243556</code></td>
    </tr>
  </tbody>
</table>

<p>We produced the following two captions:</p>

<ul>
  <li><strong>Correct</strong>: <em>“an airplane and a truck and a person”</em></li>
  <li><strong>Incorrect</strong>: <em>“an airplane and a truck and <strong>a chair</strong>“</em></li>
</ul>

<p>With all 502 images captioned with a pair of correct and incorrect captions, we asked CLIP to score each image against its two captions.</p>

<h4 id="result">Result</h4>

<p>CLIP preferred the correct caption for 458 out of 502 images—a 91.23% accuracy.</p>

<p>Here is an example failure:</p>

<p><img src="/images/2025-05-13-6.8300-Final-Project-Blog/000000069668.jpg" alt="COCO image 69668" /></p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>Caption</th>
      <th>CLIP Assigned Probability</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Correct</td>
      <td><em>“an oven and a person and a microwave”</em></td>
      <td>27.20%</td>
    </tr>
    <tr>
      <td>Incorrect</td>
      <td><em>“an oven and a person and <strong>a snowboard</strong>“</em></td>
      <td>72.80%</td>
    </tr>
  </tbody>
</table>

<p>This experiment suggests that while CLIP demonstrates a strong ability to identify correct multi-object captions (achieving 91.23% accuracy), its performance can be notably affected by inaccuracies related to less prominent objects. Instances where CLIP preferred an incorrect caption (where only the third most prominent object was altered) suggest that the model may not consistently ground or verify all listed objects with equal rigor. This implies that CLIP’s scoring mechanism may assign greater weight to dominant objects or that its compositional understanding is less robust for objects lower in the visual hierarchy.</p>

<h3 id="2-reordered-correct-vs-incorrect-captions">2. Reordered Correct vs. Incorrect Captions</h3>

<p>Building on the previous experiment, we studied whether CLIP’s caption scoring accuracy would further decrease if the correct caption listed objects from smallest to largest prominence.</p>

<p>For each image, we still constructed two captions:</p>

<ul>
  <li>A <strong>correct caption</strong> that mentions the top 3 most prominent objects in ascending order of prominence (smallest first).</li>
  <li>An <strong>incorrect caption</strong>: The incorrect caption (with the randomly swapped third object) from the previous experiment was reused.</li>
</ul>

<p>Still using the aforementioned image <code class="language-plaintext highlighter-rouge">150410</code> for example, the two captions were:</p>

<ul>
  <li><strong>Correct</strong>: <em>“a person and a truck and an airplane”</em> (objects are mentioned in reverse order of prominence compared to the previous experiment’s correct caption)</li>
  <li><strong>Incorrect</strong>: <em>“an airplane and a truck and <strong>a chair</strong>“</em></li>
</ul>

<h4 id="result-1">Result</h4>

<p>CLIP preferred the correct caption for 439 out of 502 images—an accuracy of 87.45%, lower than the previous experiment.</p>

<p>Although the absolute number of correctly-captioned images dropped by 19 (from 458 in the first experiment to 439 in this one), the change in performance is more intricate than a simple drop:</p>

<ul>
  <li>5 images whose <strong>incorrect captions</strong> were preferred by CLIP in the previous experiment: CLIP now preferred the correct caption.</li>
  <li>24 images whose correct captions were preferred by CLIP in the previous experiment: CLIP now preferred the <strong>incorrect caption</strong>.</li>
</ul>

<p>Image <code class="language-plaintext highlighter-rouge">273083</code> exemplifies one of these 24 cases where performance worsened:</p>

<p><img src="/images/2025-05-13-6.8300-Final-Project-Blog/000000273083.jpg" alt="COCO image 273083" /></p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>Caption</th>
      <th>CLIP Assigned Probability</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td> </td>
      <td>Previous experiment (Largest First Correct)</td>
      <td> </td>
    </tr>
    <tr>
      <td>Correct</td>
      <td><em>“a pizza and a dog and a dining table”</em></td>
      <td><strong>53.12%</strong></td>
    </tr>
    <tr>
      <td>Incorrect</td>
      <td><em>“a pizza and a dog and <strong>a cell phone</strong>“</em></td>
      <td>46.88%</td>
    </tr>
    <tr>
      <td> </td>
      <td>This experiment (Smallest First Correct)</td>
      <td> </td>
    </tr>
    <tr>
      <td>Correct</td>
      <td><em>“a dining table and a dog and a pizza”</em></td>
      <td>23.66%</td>
    </tr>
    <tr>
      <td>Incorrect</td>
      <td><em>“a pizza and a dog and <strong>a cell phone</strong>“</em></td>
      <td><strong>76.37%</strong></td>
    </tr>
  </tbody>
</table>

<p>This experiment reveals that the order in which objects are mentioned in a caption significantly impacts CLIP’s scoring, especially when less prominent objects are listed first. The accuracy decrease from 91.23% to 87.45% suggests CLIP is less confident or accurate when a correct caption’s textual sequence inverts the visual prominence hierarchy (i.e., smallest object mentioned first).</p>

<p>The misclassification of 24 images (previously scored correctly) when the correct caption was reordered indicates that CLIP might rely on an alignment between early-mentioned objects in the text and the most visually dominant objects in the image. When this alignment is disrupted (as in “smallest-first” correct captions), even if all objects are factually present, CLIP is more likely to prefer an incorrect caption that begins by mentioning the most prominent objects, even if it contains a subsequent error. This highlights a potential vulnerability: CLIP’s image-text alignment may be disproportionately influenced by a caption’s initial elements, potentially overshadowing a complete assessment of all described objects, especially when textual order mismatches visual salience.</p>

<h2 id="mitigation-caption-permutation-ensemble">Mitigation: Caption Permutation Ensemble</h2>

<p>To reduce CLIP’s sensitivity to the order in which objects are mentioned, we propose ensembling scores from all permutations of the 3 objects feature in a caption rather than relying on a single caption ordering. For an image \(I\) with top-3 objects \({o_1, o_2, o_3}\), the process is:</p>

<ol>
  <li>Form the set of all \(M=3! = 6\) permutations \(P = {\pi_1,\ldots,\pi_M}\), where each is a tuple of three objects.</li>
  <li>For each permutation \(\pi_m\), generate a caption \(c_m\) same as in previous experiments.</li>
  <li>
    <p>Compute its text embedding and normalize:
\(\begin{aligned}
tm \;=\;\mathrm{CLIP}_{\mathrm{text}}(c_m),\quad
\tilde t_m \;=\;\frac{t_m}{\|t_m\|_2}.
\end{aligned}\)</p>
  </li>
  <li>Average these \(M\) normalized embeddings to produce an order-invariant text representation:
\(\begin{aligned}
\bar t \;=\;\frac{1}{M}\sum_{m=1}^{M}\tilde t_m,
\qquad
\hat t \;=\;\frac{\bar t}{\|\bar t\|_2}.
\end{aligned}\)</li>
  <li>Encode and normalize the image once:
\(\begin{aligned}
v \;=\;\mathrm{CLIP}_{\mathrm{img}}(I),\quad
\hat v \;=\;\frac{v}{\|v\|_2}.
\end{aligned}\)</li>
  <li>Compute the final similarity score as the cosine similarity between \(\hat v\) and \(\hat t\):
\(\begin{aligned}
s_{\mathrm{ens}} \;=\;\langle \hat v,\;\hat t\rangle.
\end{aligned}\)</li>
</ol>

<p>We want to underscore that permutation ensemble method is applied to both the correct set of objects and the incorrect set of objects, hence making this a generalized method to use while in the real world, we don’t know if a provided list of objects are all factually featured in a given image.</p>

<p>By comparing similarity score for the correct set of objects to a similarly computed ensemble score for an incorrect set of objects (where the third object is consistently swapped across its permutations, as in previous experiments), we expect this ensemble approach to smooth out ordering noise and yield a more robust decision boundary.</p>

<h3 id="result-2">Result</h3>

<table>
  <thead>
    <tr>
      <th>Experiment</th>
      <th>Accuracy</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Experiment 1 (without mitigation)</td>
      <td>458 out of 502 (91.23%)</td>
    </tr>
    <tr>
      <td>With mitigation</td>
      <td><strong>452 out of 502 (90.04%)</strong></td>
    </tr>
    <tr>
      <td>Experiment 2 (without mitigation)</td>
      <td>439 out of 502 (87.45%)</td>
    </tr>
  </tbody>
</table>

<p>The permutation ensemble mitigation yields 90.04% accuracy (452/502), which sits between our original largest-first baseline (91.23%) and the reversed-order worst-case (87.45%). Although ensembling all six correct-caption permutations does not quite match the peak performance of the best single ordering, it substantially mitigates the drop seen with adverse orderings, recovering 2.59 percentage points (from 87.45% to 90.04%) compared to the smallest-first scenario. This demonstrates that averaging over permutations effectively smooths CLIP’s sensitivity to object mention order, trading a small amount of peak accuracy for markedly improved robustness against ordering noise.</p>

<hr />

<h2 id="conclusion">Conclusion</h2>

<p>Our investigation into CLIP’s handling of multi-object scenes using real-world COCO images confirms that the model, while generally proficient, exhibits clear sensitivities to both the accuracy of object mentions and their textual order. We found that:</p>

<ol>
  <li><strong>CLIP is vulnerable to inaccuracies concerning less prominent objects.</strong> While achieving a high accuracy (91.23%) when the most prominent objects were correctly listed first, errors in identifying the third object could still mislead the model, suggesting a potential hierarchy in how CLIP grounds objects.</li>
  <li><strong>Object mention order significantly impacts CLIP’s judgment.</strong> Reversing the caption order to list the smallest prominent object first (Experiment 2) led to a notable drop in accuracy to 87.45%. This demonstrates an “order bias,” where CLIP appears to favor captions that align with a “largest-first” heuristic, even if an alternative ordering is equally correct.</li>
  <li><strong>Permutation ensembling offers a viable mitigation strategy.</strong> By averaging the embeddings of all possible permutations of a correct object set, we achieved an accuracy of 90.04%. This approach successfully smoothed out the negative impact of unfavorable orderings, substantially recovering performance from the worst-case scenario (87.45%) and providing a more robust, order-agnostic representation. While this came at the cost of a slight dip from the optimal single-order accuracy, the gain in robustness is significant.</li>
</ol>

<p>These findings underscore the importance of considering object prominence and mention order when evaluating or deploying vision-language models like CLIP in complex, multi-object environments. While CLIP’s zero-shot capabilities are powerful, its internal biases can lead to performance variations that might be critical in real-world applications. The permutation ensemble method offers a practical step towards more reliable multi-object caption scoring, trading a small amount of peak performance for improved consistency.</p>

<p>Future work could explore a much bigger dataset size and more sophisticated ensembling techniques, investigate the architectural underpinnings of these biases within CLIP, or develop training strategies that inherently reduce such sensitivities, paving the way for even more robust and equitable vision-language understanding.</p>

<hr />

<h2 id="acknowledgement">Acknowledgement</h2>

<p>We want to <strong>credit</strong> and thank <a href="https://web.mit.edu/phillipi/">Professor Phillip Isola</a> for <strong>suggesting the permutation ensembling post-hoc approach</strong> during an insightful discussion about the project. It was truly an enlightening moment!</p>

<p>Special thanks to the <a href="https://www.scenerepresentations.org/courses/2025/spring/advances-in-cv/">Spring 2025 6.8300 teaching team</a>:</p>
<ul>
  <li><a href="https://www.vincentsitzmann.com/">Professor Vincent Sitzmann</a> for redesigning the course to be more relevant and for consistently bringing amazing energy to each lecture.</li>
  <li>Our wonderful TAs, <a href="https://vivekg.dev/">Vivek Gopalakrishnan</a>, <a href="https://yukaryote.github.io/">Isabella Yu</a>, and <a href="http://www.a14z.blog/">Adriano Hernandez</a>, providing invaluable feedback on this project and being open to engaging conversations about academic life.</li>
</ul>]]></content><author><name>Junru Ren 任俊儒</name><email>junru@computer.org</email></author><category term="class" /><category term="projects" /><category term="computer vision" /><category term="mit" /><summary type="html"><![CDATA[Abstract: Vision-language models like CLIP struggle with multi-object scenes, often favoring prominent objects or those mentioned first in captions. Using real-world COCO images, we show that CLIP’s caption-matching accuracy drops from 91.23% to 87.45% when object order is reversed. To address this, we explore a post-hoc mitigation: a permutation ensemble that averages scores across all object orders, boosting robustness and recovering accuracy to 90.04%. Our findings reveal persistent order biases and offer a simple, effective strategy to improve CLIP’s reliability in complex scenes.]]></summary></entry></feed>