OSMO: Open-vocabulary Self-eMOtion Tracking
OSIRIS is an egocentric multimodal LLM for continuous, open-vocabulary human-state tracking from smart glasses.
I enjoy making things. Here are a selection of projects that I have worked on over the years.
OSIRIS is an egocentric multimodal LLM for continuous, open-vocabulary human-state tracking from smart glasses.
OSKAR is a self-supervised multimodal foundation model that learns in the latent space by predicting masked multimodal features.
MaskCLR improves the robustness of transformer-based action recognition methods against noisy and incomplete skeletons.
S-JEPA is an instantiation of JEPA for self-supervised skeletal action recognition.
We take a step towards computer-aided waste detection and present the first in-the-wild industrial-grade waste detection and segmentation dataset, ZeroWaste.