HackMIT China 2026 Submission

Intelligent Robotic Sorting.

An AI-powered robotic sorting system for bottles and cups. It combines YOLO11 perception, RGB-D depth estimation, coordinate transformation, seek-and-follow arm control, and dexterous grasping into a full pick-move-release prototype.

Intelligent Sorting Core
System behavior

Detect, follow, grasp, place.

The system combines perception and manipulation into a focused workflow for bottle and cup sorting.

Active Sensor
RGB-D
Intel RealSense D435i
Manipulator
7-DOF
Robot arm + dexterous hand
Project snapshots

Real build moments behind the prototype.

These photos show the project as it was actually built: hardware assembly, control development, and close team coordination around the robot.

Hardware Assembly

Installing the dexterous hand

This moment marks the transition from arm-only motion to a system capable of physical grasp execution.

Team members installing the dexterous hand on the robotic arm.
Control Code

Software and control integration

This coding session shows the software side of the project, where perception outputs were turned into robot commands and interface logic.

Team member writing control software for the robotic sorting system.
Team Workflow

Shared problem-solving

This collaboration snapshot reflects the day-to-day work needed to debug hardware, tune perception, and prepare the final demo.

Team members collaborating beside the robot and laptop during the project.
Why this matters

The problem is not only seeing waste. It is acting on it safely.

Plastic bottles and cups are everywhere in recycling streams, but separating them still depends on repetitive and often unpleasant human work. Intelligent Robotic Sorting reframes the project around a sharper goal: showing that a compact robot can identify, localize, grasp, and sort common plastic waste credibly.

Messy inputs

Real-world scenes are inconsistent.

Lighting, clutter, reflective plastic, and object deformation make bottle and cup sorting harder than a clean tabletop vision demo.

Perception gap

Detection is not grasping.

A bounding box only matters if it becomes a trustworthy 3D target and then a stable robot command in the correct frame.

Safety

Motion must be engineered, not improvised.

Grasp force, hand synchronization, and motion safety have to be tuned with restraint or the whole pipeline becomes fragile fast.

Core loop

SEE. UNDERSTAND. LOCATE. ACT.

The robotic loop moves from RGB-D sensing to detection, localization, and physical sorting.

SEE

Capture the scene.

Capture RGB-D visual data from the D435i so the robot has both appearance and depth cues before it moves.

Input: RGB-D stream
UNDERSTAND

Detect and classify.

Use an improved YOLO11 model to identify bottles and cups with real-time labels and candidate grasp targets.

Output: class + bbox
LOCATE

Transform coordinates.

Convert visual detections into physical robot coordinates so the arm can align itself above the object reliably.

Output: camera/base XYZ
ACT

Execute safe motion.

Run seek-and-follow movement, adapt hand approach, then complete a categorized pick-move-release cycle.

Output: safe robot command
What we built

An integrated system for perception, localization, and robotic sorting.

The prototype combines real-time detection, coordinate transformation, arm alignment, and dexterous pick-and-place behavior into one coherent system.

Vision
RGB-D

Real-time bottle and cup detection

RealSense capture and improved YOLO11 inference give the system live object identity and position cues.

Calibration
XYZ

Coordinate transformation

2D pixels become 3D robot-space targets, which is the bridge between computer vision and physical action.

Control
Safe

Seek-and-follow arm control

The arm tracks target position continuously and keeps an intended offset while object height changes.

Interface
Web

Dexterous grasp and placement

A dexterous hand with force-aware control grips the object and places it into the proper category bin.

System overview

System architecture at a glance.

From sensor input to robot action, each layer contributes to the full sorting pipeline.

Sensor

RealSense

RGB-D acquisition

Vision

YOLO

Detection + labels

Depth

XYZ

3D point extraction

Calibration

T base-camera

Coordinate transform

Motion

Planner

Safe task commands

Interface

Web console

Status + control

Build journey

How the prototype was built and validated.

The build progressed from sensor bring-up to calibration, grasp tuning, and demo integration.

Phase 01

Camera and model bring-up

Stream live RGB-D data and confirm that YOLO11 can detect bottles and cups consistently in the real workspace.

Phase 02

Coordinate transformation and targeting

Translate visual coordinates into robot-space targets and make sure the arm can move above the detected object accurately.

Phase 03

Hardware synchronization and grasp tuning

Integrate the dexterous hand, calibrate force, and reduce grasp failures caused by depth noise or object deformation.

Phase 04

Website and operator presentation layer

Turn the full stack into something legible for judges by exposing the system story, outputs, and demo flow clearly.

Demo preview

Robot Demo Preview

Featured footage from the prototype shows target detection, arm following, dexterous grasping, and categorized placement.

Engineering decisions

Three choices that define the project tone.

Decision 01

Validate before scaling

Staged validation matters because safe iteration is one of the strongest signs of serious robotics engineering.

Decision 02

Fixed camera calibration first

Coordinate transformation is essential because accurate robot motion depends on stable alignment between camera and arm.

Decision 03

Web interface as a demo layer

The website presents system behavior, technical context, and demo evidence for judges and collaborators.

Safety and constraints

Safety is a visible feature, not a footnote.

Safety is built into the prototype through workspace limits, motion constraints, and supervised operation.

Motion

Z-axis limit

Approach limits help the arm stay above the object until localization and following have stabilized.

Workspace

Constrained operating region

Known-safe regions and designated bins stop visually plausible but physically unsafe commands from becoming live motion.

Operation

Human-supervised runs

Human supervision remains part of the loop because grasp force, connectivity, and hardware timing still matter in live demos.

Roadmap discipline

Incremental validation

The project reads better when validated work and planned upgrades are separated explicitly instead of blended together.

Next steps

What comes after the first public demo.

Roadmap

Real grasp integration

Keep improving grasp reliability and repeatability as the hardware and control policy stabilize.

Roadmap

Better grasp pose estimation

Push beyond simple center-point targeting toward richer geometry and more precise approach planning.

Roadmap

More recyclable classes

Expand beyond bottles and cups to cans, cartons, and additional recyclable waste categories.

Roadmap

Continuous closed-loop sorting

Keep moving toward edge-device deployment, longer autonomous cycles, and broader physical intelligence capabilities.

Team

The people behind the integration work.

Systems Integration

Luna Zhang

Documentation, interface integration, and communication reliability

  • Managed documentation and presentation materials for the hackathon submission.
  • Coordinated workflow across the project.
  • Handled the software-hardware interface and resolved robot communication bugs.
Vision and Manipulation

Tang Fuqing

Model integration, robot coordinates, and arm-hand integration

  • Integrated YOLO11 into the live perception pipeline.
  • Built the coordinate transformation chain from pixels to robot-space positioning.
  • Programmed the robotic arm and managed integration with the dexterous hand.
Advisors

Zhen Fan and Yi Shao

Guiding instructors who supported the project direction, system thinking, and technical development during the build.