Proposal Milestone Final Project




Final Project proposal

(we havn't decided which direction to go so we presented 3 ideas here. Any suggestion would be appreciated!)

Exploring possible applications of ML methods in graphics generation (going from drawings, pictures, or text to a graphical output) or designing rule based simulations.

CS 284a Computer graphics & Imaging 2022 Spring

Team 72 Kaleab Belete (284a), Xinwei Zhuang (284a), Tristan Streichenberge (184), Gregoria Millensifer (184)

Webpage: https://cal-cs184-student.github.io/sp22-project-webpages-xinwei-zhuang/proj_fin/index.html

Idea 1 Transform drawings into scenes

Problem description

Deliverables

Though in some disciplines, there exists numerous amounts of data. In the architectural field, the data is highly limited. We hypothesize that the network can generate a satisfying image from a single photo input. The aim is to build a game engine to generate scenes from drawings using machine learning algorithms such as Nerf (Mildenhall, 2020) and deep volumetric neural networks (Delanooy, 2018). We will provide an interactive tool for inputting a 2D sketch and generating either 2D scene or 3D models.

Evaluation matrix

The evaluation will be to compare the difference between the generated scene with the actual scene/model.

Goals

(1) we plan to deliver: when inputting an image, outputting a 3D model
(2) we hope to deliver: inputting multiple image from different perspective, generate model with high quality.

Schedule

Week 1 data preparation: scraping from openstreetmap, collect building modelling data and for each buiding, take snapshot for different perspectives.
Week 2 Using the framework of the NeRF, adjust parameters and train the model
Week 3 Debugging and finalize model,Evaluation: compare the simulation and the real-world documentation
Week 4 Clean up the final output and begin building a final deliverable(report, code, and presentation).

Reference

Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R. Nerf: Representing scenes as neural radiance fields for view synthesis. InEuropean conference on computer vision 2020 Aug 23 (pp. 405-421). Springer, Cham.
Delanoy J, Aubry M, Isola P, Efros AA, Bousseau A. 3d sketching using multi-view deep volumetric prediction. Proceedings of the ACM on Computer Graphics and Interactive Techniques. 2018 Jul 25;1(1):1-22.
Isola P, Zhu JY, Zhou T, Efros AA. Image-to-image translation with conditional adversarial networks. InProceedings of the IEEE conference on computer vision and pattern recognition 2017 (pp. 1125-1134). https://phillipi.github.io/pix2pix/

Idea 2: Applications of NLP to Graphics Manipulation

Problem description

This is an attempt to generate contextually relevant graphics based on some input text using NLP to process the input text then mapping the output to a set of scenes. The idea of generating scenes without much technical knowledge is an interesting problem with a lot of possible applications in areas ranging from advertisement to film and video games. The challenges are the accuracy of the text processing, generating appropriate scenes, the time it may take to work with a complex model, having to adequately map the text to the appropriate set of images in a limited set, and more. As this will likely be reliant on Deep Neural Networks it may take time to train and the quality of the output will be dependent on the quality of the data. This is on top of having to build or generate a set of scenes that can adequately represent a collection of input text.

Goals

The end goal is to have a simple interactive program that allows the user to input text and get a contextually relevant output in a graphics window. One possible application includes having a program that will produce an animated reaction based on the content of the text. This could be a live facial animation or full body animation as the text is being processed, where the animation depicts the reaction of a given character. Another possible application is the description of a scene in text being used to generate a graphical output. For training we plan on looking into labeled datasets of images(ranging from academic labeled datasets to tools like google street view or facebooks labeled datasets) on top of models like GPT. The measure of accuracy is fairly straightforward, as we introduce new text we would have a labeled expected output for the text and we can measure the total distance between our expected output and actual output in our test set.

Deliverables

(1) what we plan to deliver is the text processing setup, the graphics generation setup(including the set of scenes) and a final interactive program that can take an input text and generate a matching graphical scene or animation (on top of the other required documents). The accuracy here is likely to be spotty but within a core section of input text it should perform relatively well.
(2) what we hope to deliver if time permits is a robust text to graphics pipeline (with the possibility of basic editing) that works well for a good representative subset of everyday words.

Schedule

Week 1 Work out the basic text processing pipeline while beginning to build a set of scenes to map to the context of the text.
Week 2 & 3 Troubleshoot and begin integration of the text processing with graphical output. Add additional features (better processing, more scenes, improved performance, etc…) if there is extra time.
Week 4 Clean up the final output and begin building a final deliverable(report, code, and presentation).

Reference

Example Image description dataset: https://arxiv.org/abs/1605.00459
GPT: https://github.com/openai/gpt-2

Idea 3 Agent-based crowd simulation

Problem description

expected results (ref. Directing Crowd Simulations Using Navigation Fields)
We want to build a simulation for circulation in complex building settings. It is crucial when architects make the design decisions for choosing the best circulation plan, especially for large activities. Failure in circulation design can cause problems ranging from making it confusing to find ways and locating to Stampede. We will use particle simulation incorporating human interaction between each other (such as the preference of turning at a crossroad, not being too far or too close to the majority of people…) to perform circulation simulation within a building, and use the simulation to evaluate the plan.

Deliverables

We will provide a demo for the simulation, the demo will show how people behave in a building with a complex plan, and we can change variables such as the number of people to see how people behave in different scenarios. The deliverables will be an interactive program. It might be done in unity.

Evaluation matrix

For the evaluation, a potential evaluation matrix would be finding a real-world scenario (such as people entering a stadium) and comparing it with the generated simulation.

Goals

(1) we plan to deliver:Our hypothesis is that through the simulation, we can inform the design process and choose a better circulation without confusing people. We will provide a simulation that is realistic enough to inform design decisions.
(2) we hope to deliver: adding more complex population settings to incorporate more scenarios (age, sex, …)

Schedule

Week 1 prepare for the 3D model we are gonna use, set up in unity
Week 2 write the swarm algorithm with simple interactions
Week 3 more complex interactions and parameterized inputs (crowd density, preference, personalities …) and add emergency circumstances (eg. a door suddenly closed) and/or more exicting unrealistic scenes such as the one Ren showed on the lecture
Week 4 Evaluation: compare the simulation and the real-world documentation
Introducing more building plans and provide design insight for building plan
Write up

Reference

https://repository.upenn.edu/cgi/viewcontent.cgi?article=1223&context=hms
https://github.com/crowddynamics/crowddynamics
Swarm topic Ren mentioned in the lecture.

Resources

Laptop & google colab. Potentially unity if we were doing the idea 3.