Text-Driven 3D Human Generation

Summary

Our project aims to develop a text-driven 3D human generation system. Users will be able to describe their desired character using natural language, and the system will automatically generate a realistic and detailed 3D model in response.

Team members

Jian Yu, Xiaoyu Zhu, Xinzhe Wei, Zimo Fan

Problem Discription

Problem: Creating high-quality 3D human models is a time-consuming and skill-intensive process. Traditional modeling software requires technical expertise, making it inaccessible for many users and hindering rapid iteration.

Importance: Realistic 3D humans are essential in various industries. Gaming, animation, virtual/augmented reality, and even personalized medicine rely on these models. A faster, more intuitive model creation process would streamline development and broaden accessibility.

Challenge: The complexity of human form poses a major challenge. Capturing diverse body types, clothing, and subtle details requires both precision and artistic skill. What's more, reconstruction often consumes a lot of time and creates a barrier to the rollout of the application.

Proposed Solution: We will develop a text-driven 3D human generation system that leverages the new Gaussian splatting method for accelerated rendering. This system will use natural language processing (NLP) to understand user descriptions and generate corresponding 3D models, simplifying the creation process significantly.

Goals and Deliverables

Baseline Plan

Core System:
Develop a text-driven 3D human generation system that can:
- Interpret basic text descriptions (e.g., gender, body build, clothing style)
- Generate a corresponding simplified 3D human mesh.
- Integrate Gaussian splatting method.
Metrics:
- Qualitative: User surveys evaluating the intuitiveness of text-based control and visual realism of base models.
- Computational: Measure rendering time improvements due to Gaussian splatting vs. traditional techniques.
Questions to Answer:
- Can text descriptions effectively drive the generation of basic 3D human forms?
- Does Gaussian splatting offer a significant speed advantage in this context?

Aspirational Plan

Enhanced Detail: Extend the system to handle more complex text descriptions (e.g., facial features, hairstyles, finer clothing details).
Dynamic Posing: Implement the ability to pose the generated human models based on text instructions.
Brainstorming: Some more observations and innovations during implementation.
Questions to Answer:
- To what extent can we enhance model detail through text commands while maintaining real-time responsiveness?
- What are the trade-offs between visual realism and computational efficiency as we add complexity?

Schedule

Week 1
- Search for resources to research and try to reproduce others' work.
Week 2
- Analyse the results of the replication and look for strengths, weaknesses and points that can be improved.
Week 3
- Try to improve the work and brainstorm our own innovative ideas.
Week 4
- Improve, summarise our work and enhance some of the details.

Resources

Papers:
Hardware resources:
- A laptop with an RTX3060.
- A laptop with an RTX3070.
- A server with three 12G GPUs.