AAAI-26 Recap

This semester, my coauthor Tate and I presented our CASI project at AAAI-26. It's funny to think about how it started, because when we first joined the project group, we were both freshmen with little research experience prior to joining the project fellowship. The early weeks were mostly us reading papers we found interesting, trying to figure out what any of it meant, and thinking about what we might want to work on. Those sessions helped more than I expected, as reading a bunch of different papers and talking about them made the idea of "making a project" feel less abstract.

The transition into group brainstorming was a little rocky. We were trying to merge our interests, but neither of us had a strong sense of direction yet. I'd always assumed research meant coming up with something totally new, so realizing that it was normal to build off existing work made things feel more doable.

We started with a broad idea around jailbreaking, and toward the end of spring and into the summer, we found some chain-of-thought papers that gave us something more concrete to latch onto, and we met with Ida a few times to sort out planning and check in about early updates. From there, things slowly took shape. We tried a few experiments, stitching together things from other papers we'd seen and adding things as we went. Some of it worked, a lot didn't, but the chain-of-thought angle kept giving us interesting signals. And if anything, those dead ends helped us understand what wasn't promising and that itself gave questions to pursue: Why didn't this work? What made the error rates higher than our previous tries? Why does Google Colab hate me?

By the time we were putting together the AAAI poster, it felt less like one singular project or experiment and more like the natural endpoint of a long, slightly chaotic journey. We originally thought we were looking solely for a successful jailbreak, but this path led us to find patterns that unexpectedly became a really interesting part of the project. Despite the chaos of it all, we ended up with some pretty cool results that included a successful jailbreaking framework as well as changed frequencies of seemingly unrelated words in model responses even when ultimately jailbroken.

The conference itself was really fun. It was great meeting people from around the world and to learn from others whether it be at talks or casual conversations next to poster presentations. Plus, Singapore was really cool to explore, and they even took us to a zoo for one of the opening ceremonies.

The main thing I took from it was to not be afraid to build things and try ideas out. Even if they don't work and even if you don't fully know what you're doing, the question of why something fails ends up being useful. A "failed" attempt forces you to figure out what's going on, and that usually gives you a direction to move in, which can lead you then to somewhere unexpectedly interesting. I learned a lot through this project and am super grateful for Tate, Ida, and the CASI team for making this possible.

← Back to blog home