This view is limited to 50 files because it contains too many changes.  See the raw diff here.
.gitattributes CHANGED
@@ -33,5 +33,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
- custom_rasterizer-0.1-cp310-cp310-linux_x86_64.whl filter=lfs diff=lfs merge=lfs -text
37
- demo.png filter=lfs diff=lfs merge=lfs -text
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
LICENSE DELETED
@@ -1,80 +0,0 @@
1
- TENCENT HUNYUAN 3D 2.0 COMMUNITY LICENSE AGREEMENT
2
- Tencent Hunyuan 3D 2.0 Release Date: January 21, 2025
3
- THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW.
4
- By clicking to agree or by using, reproducing, modifying, distributing, performing or displaying any portion or element of the Tencent Hunyuan 3D 2.0 Works, including via any Hosted Service, You will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
5
- 1. DEFINITIONS.
6
- a. “Acceptable Use Policy” shall mean the policy made available by Tencent as set forth in the Exhibit A.
7
- b. “Agreement” shall mean the terms and conditions for use, reproduction, distribution, modification, performance and displaying of Tencent Hunyuan 3D 2.0 Works or any portion or element thereof set forth herein.
8
- c. “Documentation” shall mean the specifications, manuals and documentation for Tencent Hunyuan 3D 2.0 made publicly available by Tencent.
9
- d. “Hosted Service” shall mean a hosted service offered via an application programming interface (API), web access, or any other electronic or remote means.
10
- e. “Licensee,” “You” or “Your” shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Tencent Hunyuan 3D 2.0 Works for any purpose and in any field of use.
11
- f. “Materials” shall mean, collectively, Tencent’s proprietary Tencent Hunyuan 3D 2.0 and Documentation (and any portion thereof) as made available by Tencent under this Agreement.
12
- g. “Model Derivatives” shall mean all: (i) modifications to Tencent Hunyuan 3D 2.0 or any Model Derivative of Tencent Hunyuan 3D 2.0; (ii) works based on Tencent Hunyuan 3D 2.0 or any Model Derivative of Tencent Hunyuan 3D 2.0; or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of Tencent Hunyuan 3D 2.0 or any Model Derivative of Tencent Hunyuan 3D 2.0, to that model in order to cause that model to perform similarly to Tencent Hunyuan 3D 2.0 or a Model Derivative of Tencent Hunyuan 3D 2.0, including distillation methods, methods that use intermediate data representations, or methods based on the generation of synthetic data Outputs by Tencent Hunyuan 3D 2.0 or a Model Derivative of Tencent Hunyuan 3D 2.0 for training that model. For clarity, Outputs by themselves are not deemed Model Derivatives.
13
- h. “Output” shall mean the information and/or content output of Tencent Hunyuan 3D 2.0 or a Model Derivative that results from operating or otherwise using Tencent Hunyuan 3D 2.0 or a Model Derivative, including via a Hosted Service.
14
- i. “Tencent,” “We” or “Us” shall mean THL A29 Limited.
15
- j. “Tencent Hunyuan 3D 2.0” shall mean the 3D generation models and their software and algorithms, including trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing made publicly available by Us at https://github.com/Tencent/Hunyuan3D-2.
16
- k. “Tencent Hunyuan 3D 2.0 Works” shall mean: (i) the Materials; (ii) Model Derivatives; and (iii) all derivative works thereof.
17
- l. “Territory” shall mean the worldwide territory, excluding the territory of the European Union, United Kingdom and South Korea.
18
- m. “Third Party” or “Third Parties” shall mean individuals or legal entities that are not under common control with Us or You.
19
- n. “including” shall mean including but not limited to.
20
- 2. GRANT OF RIGHTS.
21
- We grant You, for the Territory only, a non-exclusive, non-transferable and royalty-free limited license under Tencent’s intellectual property or other rights owned by Us embodied in or utilized by the Materials to use, reproduce, distribute, create derivative works of (including Model Derivatives), and make modifications to the Materials, only in accordance with the terms of this Agreement and the Acceptable Use Policy, and You must not violate (or encourage or permit anyone else to violate) any term of this Agreement or the Acceptable Use Policy.
22
- 3. DISTRIBUTION.
23
- You may, subject to Your compliance with this Agreement, distribute or make available to Third Parties the Tencent Hunyuan 3D 2.0 Works, exclusively in the Territory, provided that You meet all of the following conditions:
24
- a. You must provide all such Third Party recipients of the Tencent Hunyuan 3D 2.0 Works or products or services using them a copy of this Agreement;
25
- b. You must cause any modified files to carry prominent notices stating that You changed the files;
26
- c. You are encouraged to: (i) publish at least one technology introduction blogpost or one public statement expressing Your experience of using the Tencent Hunyuan 3D 2.0 Works; and (ii) mark the products or services developed by using the Tencent Hunyuan 3D 2.0 Works to indicate that the product/service is “Powered by Tencent Hunyuan��; and
27
- d. All distributions to Third Parties (other than through a Hosted Service) must be accompanied by a “Notice” text file that contains the following notice: “Tencent Hunyuan 3D 2.0 is licensed under the Tencent Hunyuan 3D 2.0 Community License Agreement, Copyright © 2025 Tencent. All Rights Reserved. The trademark rights of “Tencent Hunyuan” are owned by Tencent or its affiliate.”
28
- You may add Your own copyright statement to Your modifications and, except as set forth in this Section and in Section 5, may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Model Derivatives as a whole, provided Your use, reproduction, modification, distribution, performance and display of the work otherwise complies with the terms and conditions of this Agreement (including as regards the Territory). If You receive Tencent Hunyuan 3D 2.0 Works from a Licensee as part of an integrated end user product, then this Section 3 of this Agreement will not apply to You.
29
- 4. ADDITIONAL COMMERCIAL TERMS.
30
- If, on the Tencent Hunyuan 3D 2.0 version release date, the monthly active users of all products or services made available by or for Licensee is greater than 1 million monthly active users in the preceding calendar month, You must request a license from Tencent, which Tencent may grant to You in its sole discretion, and You are not authorized to exercise any of the rights under this Agreement unless or until Tencent otherwise expressly grants You such rights.
31
- Subject to Tencent's written approval, you may request a license for the use of Tencent Hunyuan 3D 2.0 by submitting the following information to [email protected]:
32
- a. Your company’s name and associated business sector that plans to use Tencent Hunyuan 3D 2.0.
33
- b. Your intended use case and the purpose of using Tencent Hunyuan 3D 2.0.
34
- 5. RULES OF USE.
35
- a. Your use of the Tencent Hunyuan 3D 2.0 Works must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Tencent Hunyuan 3D 2.0 Works, which is hereby incorporated by reference into this Agreement. You must include the use restrictions referenced in these Sections 5(a) and 5(b) as an enforceable provision in any agreement (e.g., license agreement, terms of use, etc.) governing the use and/or distribution of Tencent Hunyuan 3D 2.0 Works and You must provide notice to subsequent users to whom You distribute that Tencent Hunyuan 3D 2.0 Works are subject to the use restrictions in these Sections 5(a) and 5(b).
36
- b. You must not use the Tencent Hunyuan 3D 2.0 Works or any Output or results of the Tencent Hunyuan 3D 2.0 Works to improve any other AI model (other than Tencent Hunyuan 3D 2.0 or Model Derivatives thereof).
37
- c. You must not use, reproduce, modify, distribute, or display the Tencent Hunyuan 3D 2.0 Works, Output or results of the Tencent Hunyuan 3D 2.0 Works outside the Territory. Any such use outside the Territory is unlicensed and unauthorized under this Agreement.
38
- 6. INTELLECTUAL PROPERTY.
39
- a. Subject to Tencent’s ownership of Tencent Hunyuan 3D 2.0 Works made by or for Tencent and intellectual property rights therein, conditioned upon Your compliance with the terms and conditions of this Agreement, as between You and Tencent, You will be the owner of any derivative works and modifications of the Materials and any Model Derivatives that are made by or for You.
40
- b. No trademark licenses are granted under this Agreement, and in connection with the Tencent Hunyuan 3D 2.0 Works, Licensee may not use any name or mark owned by or associated with Tencent or any of its affiliates, except as required for reasonable and customary use in describing and distributing the Tencent Hunyuan 3D 2.0 Works. Tencent hereby grants You a license to use “Tencent Hunyuan” (the “Mark”) in the Territory solely as required to comply with the provisions of Section 3(c), provided that You comply with any applicable laws related to trademark protection. All goodwill arising out of Your use of the Mark will inure to the benefit of Tencent.
41
- c. If You commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any person or entity alleging that the Materials or any Output, or any portion of any of the foregoing, infringe any intellectual property or other right owned or licensable by You, then all licenses granted to You under this Agreement shall terminate as of the date such lawsuit or other proceeding is filed. You will defend, indemnify and hold harmless Us from and against any claim by any Third Party arising out of or related to Your or the Third Party’s use or distribution of the Tencent Hunyuan 3D 2.0 Works.
42
- d. Tencent claims no rights in Outputs You generate. You and Your users are solely responsible for Outputs and their subsequent uses.
43
- 7. DISCLAIMERS OF WARRANTY AND LIMITATIONS OF LIABILITY.
44
- a. We are not obligated to support, update, provide training for, or develop any further version of the Tencent Hunyuan 3D 2.0 Works or to grant any license thereto.
45
- b. UNLESS AND ONLY TO THE EXTENT REQUIRED BY APPLICABLE LAW, THE TENCENT HUNYUAN 3D 2.0 WORKS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED “AS IS” WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES OF ANY KIND INCLUDING ANY WARRANTIES OF TITLE, MERCHANTABILITY, NONINFRINGEMENT, COURSE OF DEALING, USAGE OF TRADE, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING, REPRODUCING, MODIFYING, PERFORMING, DISPLAYING OR DISTRIBUTING ANY OF THE TENCENT HUNYUAN 3D 2.0 WORKS OR OUTPUTS AND ASSUME ANY AND ALL RISKS ASSOCIATED WITH YOUR OR A THIRD PARTY’S USE OR DISTRIBUTION OF ANY OF THE TENCENT HUNYUAN 3D 2.0 WORKS OR OUTPUTS AND YOUR EXERCISE OF RIGHTS AND PERMISSIONS UNDER THIS AGREEMENT.
46
- c. TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT SHALL TENCENT OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, FOR ANY DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, CONSEQUENTIAL OR PUNITIVE DAMAGES, OR LOST PROFITS OF ANY KIND ARISING FROM THIS AGREEMENT OR RELATED TO ANY OF THE TENCENT HUNYUAN 3D 2.0 WORKS OR OUTPUTS, EVEN IF TENCENT OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
47
- 8. SURVIVAL AND TERMINATION.
48
- a. The term of this Agreement shall commence upon Your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
49
- b. We may terminate this Agreement if You breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, You must promptly delete and cease use of the Tencent Hunyuan 3D 2.0 Works. Sections 6(a), 6(c), 7 and 9 shall survive the termination of this Agreement.
50
- 9. GOVERNING LAW AND JURISDICTION.
51
- a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of the Hong Kong Special Administrative Region of the People’s Republic of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
52
- b. Exclusive jurisdiction and venue for any dispute arising out of or relating to this Agreement will be a court of competent jurisdiction in the Hong Kong Special Administrative Region of the People’s Republic of China, and Tencent and Licensee consent to the exclusive jurisdiction of such court with respect to any such dispute.
53
-
54
- EXHIBIT A
55
- ACCEPTABLE USE POLICY
56
-
57
- Tencent reserves the right to update this Acceptable Use Policy from time to time.
58
- Last modified: November 5, 2024
59
-
60
- Tencent endeavors to promote safe and fair use of its tools and features, including Tencent Hunyuan 3D 2.0. You agree not to use Tencent Hunyuan 3D 2.0 or Model Derivatives:
61
- 1. Outside the Territory;
62
- 2. In any way that violates any applicable national, federal, state, local, international or any other law or regulation;
63
- 3. To harm Yourself or others;
64
- 4. To repurpose or distribute output from Tencent Hunyuan 3D 2.0 or any Model Derivatives to harm Yourself or others;
65
- 5. To override or circumvent the safety guardrails and safeguards We have put in place;
66
- 6. For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
67
- 7. To generate or disseminate verifiably false information and/or content with the purpose of harming others or influencing elections;
68
- 8. To generate or facilitate false online engagement, including fake reviews and other means of fake online engagement;
69
- 9. To intentionally defame, disparage or otherwise harass others;
70
- 10. To generate and/or disseminate malware (including ransomware) or any other content to be used for the purpose of harming electronic systems;
71
- 11. To generate or disseminate personal identifiable information with the purpose of harming others;
72
- 12. To generate or disseminate information (including images, code, posts, articles), and place the information in any public context (including –through the use of bot generated tweets), without expressly and conspicuously identifying that the information and/or content is machine generated;
73
- 13. To impersonate another individual without consent, authorization, or legal right;
74
- 14. To make high-stakes automated decisions in domains that affect an individual’s safety, rights or wellbeing (e.g., law enforcement, migration, medicine/health, management of critical infrastructure, safety components of products, essential services, credit, employment, housing, education, social scoring, or insurance);
75
- 15. In a manner that violates or disrespects the social ethics and moral standards of other countries or regions;
76
- 16. To perform, facilitate, threaten, incite, plan, promote or encourage violent extremism or terrorism;
77
- 17. For any use intended to discriminate against or harm individuals or groups based on protected characteristics or categories, online or offline social behavior or known or predicted personal or personality characteristics;
78
- 18. To intentionally exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
79
- 19. For military purposes;
80
- 20. To engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or other professional practices.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
NOTICE DELETED
@@ -1,214 +0,0 @@
1
- Usage and Legal Notices:
2
-
3
- Tencent is pleased to support the open source community by making Hunyuan 3D 2.0 available.
4
-
5
- Copyright (C) 2025 THL A29 Limited, a Tencent company. All rights reserved. The below software and/or models in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) THL A29 Limited.
6
-
7
- Hunyuan 3D 2.0 is licensed under the TENCENT HUNYUAN 3D 2.0 COMMUNITY LICENSE AGREEMENT except for the third-party components listed below, which is licensed under different terms. Hunyuan 3D 2.0 does not impose any additional limitations beyond what is outlined in the respective licenses of these third-party components. Users must comply with all terms and conditions of original licenses of these third-party components and must ensure that the usage of the third party components adheres to all relevant laws and regulations.
8
-
9
- For avoidance of doubts, Hunyuan 3D 2.0 means inference-enabling code, parameters, and weights of this Model only, which are made publicly available by Tencent in accordance with TENCENT HUNYUAN 3D 2.0 COMMUNITY LICENSE AGREEMENT.
10
-
11
-
12
- Other dependencies and licenses:
13
-
14
-
15
- Open Source Model Licensed under the MIT and CreativeML Open RAIL++-M License:
16
- --------------------------------------------------------------------
17
- 1. Stable Diffusion
18
- Copyright (c) 2022 Stability AI
19
-
20
-
21
- Terms of the MIT and CreativeML Open RAIL++-M License:
22
- --------------------------------------------------------------------
23
- Permission is hereby granted, free of charge, to any person obtaining a copy
24
- of this software and associated documentation files (the "Software"), to deal
25
- in the Software without restriction, including without limitation the rights
26
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
27
- copies of the Software, and to permit persons to whom the Software is
28
- furnished to do so, subject to the following conditions:
29
-
30
- The above copyright notice and this permission notice shall be included in all
31
- copies or substantial portions of the Software.
32
-
33
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
34
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
35
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
36
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
37
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
38
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
39
- SOFTWARE.
40
-
41
-
42
- CreativeML Open RAIL++-M License
43
- dated November 24, 2022
44
-
45
- Section I: PREAMBLE
46
-
47
- Multimodal generative models are being widely adopted and used, and have the potential to transform the way artists, among other individuals, conceive and benefit from AI or ML technologies as a tool for content creation.
48
-
49
- Notwithstanding the current and potential benefits that these artifacts can bring to society at large, there are also concerns about potential misuses of them, either due to their technical limitations or ethical considerations.
50
-
51
- In short, this license strives for both the open and responsible downstream use of the accompanying model. When it comes to the open character, we took inspiration from open source permissive licenses regarding the grant of IP rights. Referring to the downstream responsible use, we added use-based restrictions not permitting the use of the Model in very specific scenarios, in order for the licensor to be able to enforce the license in case potential misuses of the Model may occur. At the same time, we strive to promote open and responsible research on generative models for art and content generation.
52
-
53
- Even though downstream derivative versions of the model could be released under different licensing terms, the latter will always have to include - at minimum - the same use-based restrictions as the ones in the original license (this license). We believe in the intersection between open and responsible AI development; thus, this License aims to strike a balance between both in order to enable responsible open-science in the field of AI.
54
-
55
- This License governs the use of the model (and its derivatives) and is informed by the model card associated with the model.
56
-
57
- NOW THEREFORE, You and Licensor agree as follows:
58
-
59
- 1. Definitions
60
-
61
- - "License" means the terms and conditions for use, reproduction, and Distribution as defined in this document.
62
- - "Data" means a collection of information and/or content extracted from the dataset used with the Model, including to train, pretrain, or otherwise evaluate the Model. The Data is not licensed under this License.
63
- - "Output" means the results of operating a Model as embodied in informational content resulting therefrom.
64
- - "Model" means any accompanying machine-learning based assemblies (including checkpoints), consisting of learnt weights, parameters (including optimizer states), corresponding to the model architecture as embodied in the Complementary Material, that have been trained or tuned, in whole or in part on the Data, using the Complementary Material.
65
- - "Derivatives of the Model" means all modifications to the Model, works based on the Model, or any other model which is created or initialized by transfer of patterns of the weights, parameters, activations or output of the Model, to the other model, in order to cause the other model to perform similarly to the Model, including - but not limited to - distillation methods entailing the use of intermediate data representations or methods based on the generation of synthetic data by the Model for training the other model.
66
- - "Complementary Material" means the accompanying source code and scripts used to define, run, load, benchmark or evaluate the Model, and used to prepare data for training or evaluation, if any. This includes any accompanying documentation, tutorials, examples, etc, if any.
67
- - "Distribution" means any transmission, reproduction, publication or other sharing of the Model or Derivatives of the Model to a third party, including providing the Model as a hosted service made available by electronic or other remote means - e.g. API-based or web access.
68
- - "Licensor" means the copyright owner or entity authorized by the copyright owner that is granting the License, including the persons or entities that may have rights in the Model and/or distributing the Model.
69
- - "You" (or "Your") means an individual or Legal Entity exercising permissions granted by this License and/or making use of the Model for whichever purpose and in any field of use, including usage of the Model in an end-use application - e.g. chatbot, translator, image generator.
70
- - "Third Parties" means individuals or legal entities that are not under common control with Licensor or You.
71
- - "Contribution" means any work of authorship, including the original version of the Model and any modifications or additions to that Model or Derivatives of the Model thereof, that is intentionally submitted to Licensor for inclusion in the Model by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Model, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
72
- - "Contributor" means Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Model.
73
-
74
- Section II: INTELLECTUAL PROPERTY RIGHTS
75
-
76
- Both copyright and patent grants apply to the Model, Derivatives of the Model and Complementary Material. The Model and Derivatives of the Model are subject to additional terms as described in Section III.
77
-
78
- 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Complementary Material, the Model, and Derivatives of the Model.
79
- 3. Grant of Patent License. Subject to the terms and conditions of this License and where and as applicable, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this paragraph) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Model and the Complementary Material, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Model to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Model and/or Complementary Material or a Contribution incorporated within the Model and/or Complementary Material constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for the Model and/or Work shall terminate as of the date such litigation is asserted or filed.
80
-
81
- Section III: CONDITIONS OF USAGE, DISTRIBUTION AND REDISTRIBUTION
82
-
83
- 4. Distribution and Redistribution. You may host for Third Party remote access purposes (e.g. software-as-a-service), reproduce and distribute copies of the Model or Derivatives of the Model thereof in any medium, with or without modifications, provided that You meet the following conditions:
84
- Use-based restrictions as referenced in paragraph 5 MUST be included as an enforceable provision by You in any type of legal agreement (e.g. a license) governing the use and/or distribution of the Model or Derivatives of the Model, and You shall give notice to subsequent users You Distribute to, that the Model or Derivatives of the Model are subject to paragraph 5. This provision does not apply to the use of Complementary Material.
85
- You must give any Third Party recipients of the Model or Derivatives of the Model a copy of this License;
86
- You must cause any modified files to carry prominent notices stating that You changed the files;
87
- You must retain all copyright, patent, trademark, and attribution notices excluding those notices that do not pertain to any part of the Model, Derivatives of the Model.
88
- You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions - respecting paragraph 4.a. - for use, reproduction, or Distribution of Your modifications, or for any such Derivatives of the Model as a whole, provided Your use, reproduction, and Distribution of the Model otherwise complies with the conditions stated in this License.
89
- 5. Use-based restrictions. The restrictions set forth in Attachment A are considered Use-based restrictions. Therefore You cannot use the Model and the Derivatives of the Model for the specified restricted uses. You may use the Model subject to this License, including only for lawful purposes and in accordance with the License. Use may include creating any content with, finetuning, updating, running, training, evaluating and/or reparametrizing the Model. You shall require all of Your users who use the Model or a Derivative of the Model to comply with the terms of this paragraph (paragraph 5).
90
- 6. The Output You Generate. Except as set forth herein, Licensor claims no rights in the Output You generate using the Model. You are accountable for the Output you generate and its subsequent uses. No use of the output can contravene any provision as stated in the License.
91
-
92
- Section IV: OTHER PROVISIONS
93
-
94
- 7. Updates and Runtime Restrictions. To the maximum extent permitted by law, Licensor reserves the right to restrict (remotely or otherwise) usage of the Model in violation of this License.
95
- 8. Trademarks and related. Nothing in this License permits You to make use of Licensors’ trademarks, trade names, logos or to otherwise suggest endorsement or misrepresent the relationship between the parties; and any rights not expressly granted herein are reserved by the Licensors.
96
- 9. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Model and the Complementary Material (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Model, Derivatives of the Model, and the Complementary Material and assume any risks associated with Your exercise of permissions under this License.
97
- 10. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Model and the Complementary Material (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
98
- 11. Accepting Warranty or Additional Liability. While redistributing the Model, Derivatives of the Model and the Complementary Material thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
99
- 12. If any provision of this License is held to be invalid, illegal or unenforceable, the remaining provisions shall be unaffected thereby and remain valid as if such provision had not been set forth herein.
100
-
101
- END OF TERMS AND CONDITIONS
102
-
103
-
104
-
105
-
106
- Attachment A
107
-
108
- Use Restrictions
109
-
110
- You agree not to use the Model or Derivatives of the Model:
111
-
112
- - In any way that violates any applicable national, federal, state, local or international law or regulation;
113
- - For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
114
- - To generate or disseminate verifiably false information and/or content with the purpose of harming others;
115
- - To generate or disseminate personal identifiable information that can be used to harm an individual;
116
- - To defame, disparage or otherwise harass others;
117
- - For fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;
118
- - For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;
119
- - To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
120
- - For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories;
121
- - To provide medical advice and medical results interpretation;
122
- - To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment (e.g. by text profiling, drawing causal relationships between assertions made in documents, indiscriminate and arbitrarily-targeted use).
123
-
124
-
125
-
126
- Open Source Model Licensed under the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT and Other Licenses of the Third-Party Components therein:
127
- --------------------------------------------------------------------
128
- 1. HunyuanDiT
129
- Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
130
-
131
-
132
- Terms of the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT:
133
- --------------------------------------------------------------------
134
- TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT
135
- Tencent Hunyuan Release Date: 2024/5/14
136
- By clicking to agree or by using, reproducing, modifying, distributing, performing or displaying any portion or element of the Tencent Hunyuan Works, including via any Hosted Service, You will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
137
- 1. DEFINITIONS.
138
- a. “Acceptable Use Policy” shall mean the policy made available by Tencent as set forth in the Exhibit A.
139
- b. “Agreement” shall mean the terms and conditions for use, reproduction, distribution, modification, performance and displaying of the Hunyuan Works or any portion or element thereof set forth herein.
140
- c. “Documentation” shall mean the specifications, manuals and documentation for Tencent Hunyuan made publicly available by Tencent.
141
- d. “Hosted Service” shall mean a hosted service offered via an application programming interface (API), web access, or any other electronic or remote means.
142
- e. “Licensee,” “You” or “Your” shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Tencent Hunyuan Works for any purpose and in any field of use.
143
- f. “Materials” shall mean, collectively, Tencent’s proprietary Tencent Hunyuan and Documentation (and any portion thereof) as made available by Tencent under this Agreement.
144
- g. “Model Derivatives” shall mean all: (i) modifications to Tencent Hunyuan or any Model Derivative of Tencent Hunyuan; (ii) works based on Tencent Hunyuan or any Model Derivative of Tencent Hunyuan; or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of Tencent Hunyuan or any Model Derivative of Tencent Hunyuan, to that model in order to cause that model to perform similarly to Tencent Hunyuan or a Model Derivative of Tencent Hunyuan, including distillation methods, methods that use intermediate data representations, or methods based on the generation of synthetic data Outputs by Tencent Hunyuan or a Model Derivative of Tencent Hunyuan for training that model. For clarity, Outputs by themselves are not deemed Model Derivatives.
145
- h. “Output” shall mean the information and/or content output of Tencent Hunyuan or a Model Derivative that results from operating or otherwise using Tencent Hunyuan or a Model Derivative, including via a Hosted Service.
146
- i. “Tencent,” “We” or “Us” shall mean THL A29 Limited.
147
- j. “Tencent Hunyuan” shall mean the large language models, image/video/audio/3D generation models, and multimodal large language models and their software and algorithms, including trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing made publicly available by Us at https://huggingface.co/Tencent-Hunyuan/HunyuanDiT and https://github.com/Tencent/HunyuanDiT .
148
- k. “Tencent Hunyuan Works” shall mean: (i) the Materials; (ii) Model Derivatives; and (iii) all derivative works thereof.
149
- l. “Third Party” or “Third Parties” shall mean individuals or legal entities that are not under common control with Us or You.
150
- m. “including” shall mean including but not limited to.
151
- 2. GRANT OF RIGHTS.
152
- We grant You a non-exclusive, worldwide, non-transferable and royalty-free limited license under Tencent’s intellectual property or other rights owned by Us embodied in or utilized by the Materials to use, reproduce, distribute, create derivative works of (including Model Derivatives), and make modifications to the Materials, only in accordance with the terms of this Agreement and the Acceptable Use Policy, and You must not violate (or encourage or permit anyone else to violate) any term of this Agreement or the Acceptable Use Policy.
153
- 3. DISTRIBUTION.
154
- You may, subject to Your compliance with this Agreement, distribute or make available to Third Parties the Tencent Hunyuan Works, provided that You meet all of the following conditions:
155
- a. You must provide all such Third Party recipients of the Tencent Hunyuan Works or products or services using them a copy of this Agreement;
156
- b. You must cause any modified files to carry prominent notices stating that You changed the files;
157
- c. You are encouraged to: (i) publish at least one technology introduction blogpost or one public statement expressing Your experience of using the Tencent Hunyuan Works; and (ii) mark the products or services developed by using the Tencent Hunyuan Works to indicate that the product/service is “Powered by Tencent Hunyuan”; and
158
- d. All distributions to Third Parties (other than through a Hosted Service) must be accompanied by a “Notice” text file that contains the following notice: “Tencent Hunyuan is licensed under the Tencent Hunyuan Community License Agreement, Copyright © 2024 Tencent. All Rights Reserved. The trademark rights of “Tencent Hunyuan” are owned by Tencent or its affiliate.”
159
- You may add Your own copyright statement to Your modifications and, except as set forth in this Section and in Section 5, may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Model Derivatives as a whole, provided Your use, reproduction, modification, distribution, performance and display of the work otherwise complies with the terms and conditions of this Agreement. If You receive Tencent Hunyuan Works from a Licensee as part of an integrated end user product, then this Section 3 of this Agreement will not apply to You.
160
- 4. ADDITIONAL COMMERCIAL TERMS.
161
- If, on the Tencent Hunyuan version release date, the monthly active users of all products or services made available by or for Licensee is greater than 100 million monthly active users in the preceding calendar month, You must request a license from Tencent, which Tencent may grant to You in its sole discretion, and You are not authorized to exercise any of the rights under this Agreement unless or until Tencent otherwise expressly grants You such rights.
162
- 5. RULES OF USE.
163
- a. Your use of the Tencent Hunyuan Works must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Tencent Hunyuan Works, which is hereby incorporated by reference into this Agreement. You must include the use restrictions referenced in these Sections 5(a) and 5(b) as an enforceable provision in any agreement (e.g., license agreement, terms of use, etc.) governing the use and/or distribution of Tencent Hunyuan Works and You must provide notice to subsequent users to whom You distribute that Tencent Hunyuan Works are subject to the use restrictions in these Sections 5(a) and 5(b).
164
- b. You must not use the Tencent Hunyuan Works or any Output or results of the Tencent Hunyuan Works to improve any other large language model (other than Tencent Hunyuan or Model Derivatives thereof).
165
- 6. INTELLECTUAL PROPERTY.
166
- a. Subject to Tencent’s ownership of Tencent Hunyuan Works made by or for Tencent and intellectual property rights therein, conditioned upon Your compliance with the terms and conditions of this Agreement, as between You and Tencent, You will be the owner of any derivative works and modifications of the Materials and any Model Derivatives that are made by or for You.
167
- b. No trademark licenses are granted under this Agreement, and in connection with the Tencent Hunyuan Works, Licensee may not use any name or mark owned by or associated with Tencent or any of its affiliates, except as required for reasonable and customary use in describing and distributing the Tencent Hunyuan Works. Tencent hereby grants You a license to use “Tencent Hunyuan” (the “Mark”) solely as required to comply with the provisions of Section 3(c), provided that You comply with any applicable laws related to trademark protection. All goodwill arising out of Your use of the Mark will inure to the benefit of Tencent.
168
- c. If You commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any person or entity alleging that the Materials or any Output, or any portion of any of the foregoing, infringe any intellectual property or other right owned or licensable by You, then all licenses granted to You under this Agreement shall terminate as of the date such lawsuit or other proceeding is filed. You will defend, indemnify and hold harmless Us from and against any claim by any Third Party arising out of or related to Your or the Third Party’s use or distribution of the Tencent Hunyuan Works.
169
- d. Tencent claims no rights in Outputs You generate. You and Your users are solely responsible for Outputs and their subsequent uses.
170
- 7. DISCLAIMERS OF WARRANTY AND LIMITATIONS OF LIABILITY.
171
- a. We are not obligated to support, update, provide training for, or develop any further version of the Tencent Hunyuan Works or to grant any license thereto.
172
- b. UNLESS AND ONLY TO THE EXTENT REQUIRED BY APPLICABLE LAW, THE TENCENT HUNYUAN WORKS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED “AS IS” WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES OF ANY KIND INCLUDING ANY WARRANTIES OF TITLE, MERCHANTABILITY, NONINFRINGEMENT, COURSE OF DEALING, USAGE OF TRADE, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING, REPRODUCING, MODIFYING, PERFORMING, DISPLAYING OR DISTRIBUTING ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS AND ASSUME ANY AND ALL RISKS ASSOCIATED WITH YOUR OR A THIRD PARTY’S USE OR DISTRIBUTION OF ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS AND YOUR EXERCISE OF RIGHTS AND PERMISSIONS UNDER THIS AGREEMENT.
173
- c. TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT SHALL TENCENT OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, FOR ANY DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, CONSEQUENTIAL OR PUNITIVE DAMAGES, OR LOST PROFITS OF ANY KIND ARISING FROM THIS AGREEMENT OR RELATED TO ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS, EVEN IF TENCENT OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
174
- 8. SURVIVAL AND TERMINATION.
175
- a. The term of this Agreement shall commence upon Your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
176
- b. We may terminate this Agreement if You breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, You must promptly delete and cease use of the Tencent Hunyuan Works. Sections 6(a), 6(c), 7 and 9 shall survive the termination of this Agreement.
177
- 9. GOVERNING LAW AND JURISDICTION.
178
- a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of the Hong Kong Special Administrative Region of the People’s Republic of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
179
- b. Exclusive jurisdiction and venue for any dispute arising out of or relating to this Agreement will be a court of competent jurisdiction in the Hong Kong Special Administrative Region of the People’s Republic of China, and Tencent and Licensee consent to the exclusive jurisdiction of such court with respect to any such dispute.
180
-
181
-
182
- EXHIBIT A
183
- ACCEPTABLE USE POLICY
184
-
185
- Tencent reserves the right to update this Acceptable Use Policy from time to time.
186
- Last modified: 2024/5/14
187
-
188
- Tencent endeavors to promote safe and fair use of its tools and features, including Tencent Hunyuan. You agree not to use Tencent Hunyuan or Model Derivatives:
189
- 1. In any way that violates any applicable national, federal, state, local, international or any other law or regulation;
190
- 2. To harm Yourself or others;
191
- 3. To repurpose or distribute output from Tencent Hunyuan or any Model Derivatives to harm Yourself or others;
192
- 4. To override or circumvent the safety guardrails and safeguards We have put in place;
193
- 5. For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
194
- 6. To generate or disseminate verifiably false information and/or content with the purpose of harming others or influencing elections;
195
- 7. To generate or facilitate false online engagement, including fake reviews and other means of fake online engagement;
196
- 8. To intentionally defame, disparage or otherwise harass others;
197
- 9. To generate and/or disseminate malware (including ransomware) or any other content to be used for the purpose of harming electronic systems;
198
- 10. To generate or disseminate personal identifiable information with the purpose of harming others;
199
- 11. To generate or disseminate information (including images, code, posts, articles), and place the information in any public context (including –through the use of bot generated tweets), without expressly and conspicuously identifying that the information and/or content is machine generated;
200
- 12. To impersonate another individual without consent, authorization, or legal right;
201
- 13. To make high-stakes automated decisions in domains that affect an individual’s safety, rights or wellbeing (e.g., law enforcement, migration, medicine/health, management of critical infrastructure, safety components of products, essential services, credit, employment, housing, education, social scoring, or insurance);
202
- 14. In a manner that violates or disrespects the social ethics and moral standards of other countries or regions;
203
- 15. To perform, facilitate, threaten, incite, plan, promote or encourage violent extremism or terrorism;
204
- 16. For any use intended to discriminate against or harm individuals or groups based on protected characteristics or categories, online or offline social behavior or known or predicted personal or personality characteristics;
205
- 17. To intentionally exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
206
- 18. For military purposes;
207
- 19. To engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or other professional practices.
208
-
209
- For the license of other third party components, please refer to the following URL:
210
- https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/blob/main/Notice
211
-
212
- --------------------------------------------------------------------
213
-
214
- This Model also incorporates insights from Flux's neural network architechtures (https://github.com/black-forest-labs/flux?tab=readme-ov-file). Credits are given to the orginal authors.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,230 +1,14 @@
1
  ---
2
- title: Hunyuan3D-2.0
3
- emoji: 🌍
4
- colorFrom: purple
5
- colorTo: red
6
  sdk: gradio
7
- sdk_version: 4.44.0
8
-
9
- app_file: gradio_app.py
10
  pinned: false
11
- short_description: Text-to-3D and Image-to-3D Generation
12
- models:
13
- - tencent/Hunyuan3D-2
14
  ---
15
 
16
-
17
- [中文阅读](README_zh_cn.md)
18
- [日本語で読む](README_ja_jp.md)
19
-
20
- <p align="center">
21
- <img src="./assets/images/teaser.jpg">
22
-
23
-
24
- </p>
25
-
26
- <div align="center">
27
- <a href=https://3d.hunyuan.tencent.com target="_blank"><img src=https://img.shields.io/badge/Official%20Site-black.svg?logo=homepage height=22px></a>
28
- <a href=https://huggingface.co/spaces/tencent/Hunyuan3D-2 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Demo-276cb4.svg height=22px></a>
29
- <a href=https://huggingface.co/tencent/Hunyuan3D-2 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
30
- <a href=https://3d-models.hunyuan.tencent.com/ target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
31
- <a href=https://discord.gg/GuaWYwzKbX target="_blank"><img src= https://img.shields.io/badge/Discord-white.svg?logo=discord height=22px></a>
32
- <a href=https://github.com/Tencent/Hunyuan3D-2/blob/main/assets/report/Tencent_Hunyuan3D_2_0.pdf target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
33
- </div>
34
-
35
-
36
- [//]: # ( <a href=# target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>)
37
-
38
- [//]: # ( <a href=# target="_blank"><img src= https://img.shields.io/badge/Colab-8f2628.svg?logo=googlecolab height=22px></a>)
39
-
40
- [//]: # ( <a href="#"><img alt="PyPI - Downloads" src="https://img.shields.io/pypi/v/mulankit?logo=pypi" height=22px></a>)
41
-
42
- > Join our **[Wechat](#find-us)** and **[Discord](#find-us)** group to discuss and find help from us.
43
-
44
-
45
- <p align="center">
46
- “ Living out everyone’s imagination on creating and manipulating 3D assets.”
47
- </p>
48
-
49
- ## 🔥 News
50
-
51
- - Jan 21, 2025: 💬 Enjoy exciting 3D generation on our website [Hunyuan3D Studio](https://3d.hunyuan.tencent.com)!
52
- - Jan 21, 2025: 💬 Release inference code and pretrained models
53
- of [Hunyuan3D 2.0](https://huggingface.co/tencent/Hunyuan3D-2).
54
- - Jan 21, 2025: 💬 Release Hunyuan3D 2.0. Please give it a try
55
- via [huggingface space](https://huggingface.co/spaces/tencent/Hunyuan3D-2)
56
- our [official site](https://3d.hunyuan.tencent.com)!
57
-
58
- ## **Abstract**
59
-
60
- We present Hunyuan3D 2.0, an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets.
61
- This system includes two foundation components: a large-scale shape generation model - Hunyuan3D-DiT, and a large-scale
62
- texture synthesis model - Hunyuan3D-Paint.
63
- The shape generative model, built on a scalable flow-based diffusion transformer, aims to create geometry that properly
64
- aligns with a given condition image, laying a solid foundation for downstream applications.
65
- The texture synthesis model, benefiting from strong geometric and diffusion priors, produces high-resolution and vibrant
66
- texture maps for either generated or hand-crafted meshes.
67
- Furthermore, we build Hunyuan3D-Studio - a versatile, user-friendly production platform that simplifies the re-creation
68
- process of 3D assets. It allows both professional and amateur users to manipulate or even animate their meshes
69
- efficiently.
70
- We systematically evaluate our models, showing that Hunyuan3D 2.0 outperforms previous state-of-the-art models,
71
- including the open-source models and closed-source models in geometry details, condition alignment, texture quality, and
72
- e.t.c.
73
-
74
-
75
-
76
- <p align="center">
77
- <img src="assets/images/system.jpg">
78
- </p>
79
-
80
- ## ☯️ **Hunyuan3D 2.0**
81
-
82
- ### Architecture
83
-
84
- Hunyuan3D 2.0 features a two-stage generation pipeline, starting with the creation of a bare mesh, followed by the
85
- synthesis of a texture map for that mesh. This strategy is effective for decoupling the difficulties of shape and
86
- texture generation and also provides flexibility for texturing either generated or handcrafted meshes.
87
-
88
- <p align="left">
89
- <img src="assets/images/arch.jpg">
90
- </p>
91
-
92
- ### Performance
93
-
94
- We have evaluated Hunyuan3D 2.0 with other open-source as well as close-source 3d-generation methods.
95
- The numerical results indicate that Hunyuan3D 2.0 surpasses all baselines in the quality of generated textured 3D assets
96
- and the condition following ability.
97
-
98
- | Model | CMMD(⬇) | FID_CLIP(⬇) | FID(⬇) | CLIP-score(⬆) |
99
- |-------------------------|-----------|-------------|-------------|---------------|
100
- | Top Open-source Model1 | 3.591 | 54.639 | 289.287 | 0.787 |
101
- | Top Close-source Model1 | 3.600 | 55.866 | 305.922 | 0.779 |
102
- | Top Close-source Model2 | 3.368 | 49.744 | 294.628 | 0.806 |
103
- | Top Close-source Model3 | 3.218 | 51.574 | 295.691 | 0.799 |
104
- | Hunyuan3D 2.0 | **3.193** | **49.165** | **282.429** | **0.809** |
105
-
106
- Generation results of Hunyuan3D 2.0:
107
- <p align="left">
108
- <img src="assets/images/e2e-1.gif" height=250>
109
- <img src="assets/images/e2e-2.gif" height=250>
110
- </p>
111
-
112
- ### Pretrained Models
113
-
114
- | Model | Date | Huggingface |
115
- |----------------------|------------|--------------------------------------------------------|
116
- | Hunyuan3D-DiT-v2-0 | 2025-01-21 | [Download](https://huggingface.co/tencent/Hunyuan3D-2) |
117
- | Hunyuan3D-Paint-v2-0 | 2025-01-21 | [Download](https://huggingface.co/tencent/Hunyuan3D-2) |
118
-
119
- ## 🤗 Get Started with Hunyuan3D 2.0
120
-
121
- You may follow the next steps to use Hunyuan3D 2.0 via code or the Gradio App.
122
-
123
- ### Install Requirements
124
-
125
- Please install Pytorch via the [official](https://pytorch.org/) site. Then install the other requirements via
126
-
127
- ```bash
128
- pip install -r requirements.txt
129
- # for texture
130
- cd hy3dgen/texgen/custom_rasterizer
131
- python3 setup.py install
132
- cd hy3dgen/texgen/differentiable_renderer
133
- bash compile_mesh_painter.sh
134
- ```
135
-
136
- ### API Usage
137
-
138
- We designed a diffusers-like API to use our shape generation model - Hunyuan3D-DiT and texture synthesis model -
139
- Hunyuan3D-Paint.
140
-
141
- You could assess **Hunyuan3D-DiT** via:
142
-
143
- ```python
144
- from hy3dgen.shapegen import Hunyuan3DDiTFlowMatchingPipeline
145
-
146
- pipeline = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained('tencent/Hunyuan3D-2')
147
- mesh = pipeline(image='assets/demo.png')[0]
148
- ```
149
-
150
- The output mesh is a [trimesh object](https://trimesh.org/trimesh.html), which you could save to glb/obj (or other
151
- format) file.
152
-
153
- For **Hunyuan3D-Paint**, do the following:
154
-
155
- ```python
156
- from hy3dgen.texgen import Hunyuan3DPaintPipeline
157
- from hy3dgen.shapegen import Hunyuan3DDiTFlowMatchingPipeline
158
-
159
- # let's generate a mesh first
160
- pipeline = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained('tencent/Hunyuan3D-2')
161
- mesh = pipeline(image='assets/demo.png')[0]
162
-
163
- pipeline = Hunyuan3DPaintPipeline.from_pretrained('tencent/Hunyuan3D-2')
164
- mesh = pipeline(mesh, image='assets/demo.png')
165
- ```
166
-
167
- Please visit [minimal_demo.py](minimal_demo.py) for more advanced usage, such as **text to 3D** and **texture generation
168
- for handcrafted mesh**.
169
-
170
- ### Gradio App
171
-
172
- You could also host a [Gradio](https://www.gradio.app/) App in your own computer via:
173
-
174
- ```bash
175
- python3 gradio_app.py
176
- ```
177
-
178
- Don't forget to visit [Hunyuan3D](https://3d.hunyuan.tencent.com) for quick use, if you don't want to host yourself.
179
-
180
- ## 📑 Open-Source Plan
181
-
182
- - [x] Inference Code
183
- - [x] Model Checkpoints
184
- - [x] Technical Report
185
- - [ ] ComfyUI
186
- - [ ] TensorRT Version
187
-
188
- ## 🔗 BibTeX
189
-
190
- If you found this repository helpful, please cite our reports:
191
-
192
- ```bibtex
193
- @misc{hunyuan3d22025tencent,
194
- title={Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation},
195
- author={Tencent Hunyuan3D Team},
196
- year={2025},
197
- }
198
-
199
- @misc{yang2024tencent,
200
- title={Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation},
201
- year={2024},
202
- author={Tencent Hunyuan3D Team},
203
- eprint={2411.02293},
204
- archivePrefix={arXiv},
205
- primaryClass={cs.CV}
206
- }
207
- ```
208
-
209
- ## Acknowledgements
210
-
211
- We would like to thank the contributors to
212
- the [DINOv2](https://github.com/facebookresearch/dinov2), [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), [FLUX](https://github.com/black-forest-labs/flux), [diffusers](https://github.com/huggingface/diffusers), [HuggingFace](https://huggingface.co), [CraftsMan3D](https://github.com/wyysf-98/CraftsMan3D),
213
- and [Michelangelo](https://github.com/NeuralCarver/Michelangelo/tree/main) repositories, for their open research and
214
- exploration.
215
-
216
- ## Find Us
217
-
218
- | Wechat Group | Xiaohongshu | X | Discord |
219
- |--------------|-------------|---|---------|
220
- | | | | |
221
-
222
- ## Star History
223
-
224
- <a href="https://star-history.com/#Tencent/Hunyuan3D-2&Date">
225
- <picture>
226
- <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Tencent/Hunyuan3D-2&type=Date&theme=dark" />
227
- <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Tencent/Hunyuan3D-2&type=Date" />
228
- <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Tencent/Hunyuan3D-2&type=Date" />
229
- </picture>
230
- </a>
 
1
  ---
2
+ title: Forger
3
+ emoji: 🔥
4
+ colorFrom: gray
5
+ colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 5.49.1
8
+ app_file: app.py
 
9
  pinned: false
10
+ license: apache-2.0
11
+ short_description: Rapid, simple 3D design. Build complex models.
 
12
  ---
13
 
14
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README_zh_cn.md DELETED
@@ -1,160 +0,0 @@
1
- [Read in English](README.md)
2
-
3
- <p align="center">
4
- <img src="./assets/images/teaser.jpg">
5
-
6
- </p>
7
-
8
- <div align="center">
9
- <a href=https://3d.hunyuan.tencent.com target="_blank"><img src=https://img.shields.io/badge/Hunyuan3D-black.svg?logo=homepage height=22px></a>
10
- <a href=https://huggingface.co/spaces/tencent/Hunyuan3D-2 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Demo-276cb4.svg height=22px></a>
11
- <a href=https://huggingface.co/tencent/Hunyuan3D-2 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
12
- <a href=https://3d-models.hunyuan.tencent.com/ target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
13
- <a href=https://discord.gg/GuaWYwzKbX target="_blank"><img src= https://img.shields.io/badge/Page-white.svg?logo=discord height=22px></a>
14
- </div>
15
-
16
- <br>
17
- <p align="center">
18
- “通过 3D 创作与编辑让每个人的想象变成现实。”
19
- </p>
20
-
21
- ## 🔥 最新消息
22
-
23
- - Jan 21, 2025: 💬 我们发布了 [Hunyuan3D 2.0](https://huggingface.co/spaces/tencent/Hunyuan3D-2). 快来试试吧!
24
-
25
- ## 概览
26
-
27
- 混元 3D 2.0 是一款先进的大规模 3D 资产创作系统,它可以用于生成高分辨率的 3D 白膜以及带纹理的 3D
28
- 模型。该系统包含两个基础组件:一个大规模几何生成模型 — 混元 3D-DiT,以及一个大规模纹理合成模型 — 混元 3D-Paint。
29
- 几何生成模型基于基于流扩散的扩散模型构建,旨在生成与给定条件图像精确匹配的几何模型,为下游应用奠定坚实基础。
30
- 纹理合成模型得益于强大的几何和扩散模型先验知识,能够为AI生成的或手工制作的网格模型生成高分辨率且生动逼真的纹理贴图。
31
- 此外,我们打造了混元 3D 功能矩阵,一个功能多样、易于使用的创作平台,简化了 3D 模型的制作以及修改过程。它使专业用户和业余爱好者都能高效地对3D模型进行操作,甚至制作动画。
32
- 我们对该系统进行了系统评估,结果表明混元 3D 2.0 在几何细节、条件匹配、纹理质量等方面均优于以往的最先进的开源以及闭源模型。
33
-
34
- <p align="center">
35
- <img src="assets/images/system.jpg">
36
- </p>
37
-
38
- ## ☯️ **Hunyuan3D 2.0**
39
-
40
- ### 模型架构
41
-
42
- 混元 3D 2.0 采用了一个两阶段的生成过程,它首先创建一个无纹理的几何模型,然后为该几何模型合成纹理贴图。这种策略有效地将形状生成和纹理生成的难点分离开来,同时也为生成的几何模型或手工制作的几何模型进行纹理处理提供了灵活性。
43
-
44
- <p align="left">
45
- <img src="assets/images/arch.jpg">
46
- </p>
47
-
48
- ### 性能评估
49
-
50
- 我们将混元 3D 2.0 与其他开源及闭源的 3D 生成方法进行了评估对比。
51
- 数值结果表明,在生成的带纹理 3D 模型的质量以及对给定条件的遵循能力方面,混元 3D 2.0 超越了所有的基准模型。
52
-
53
- | Model | CMMD(⬇) | FID_CLIP(⬇) | FID(⬇) | CLIP-score(⬆) |
54
- |-------------------------|-----------|-------------|-------------|---------------|
55
- | Top Open-source Model1 | 3.591 | 54.639 | 289.287 | 0.787 |
56
- | Top Close-source Model1 | 3.600 | 55.866 | 305.922 | 0.779 |
57
- | Top Close-source Model2 | 3.368 | 49.744 | 294.628 | 0.806 |
58
- | Top Close-source Model3 | 3.218 | 51.574 | 295.691 | 0.799 |
59
- | Hunyuan3D 2.0 | **3.193** | **49.165** | **282.429** | **0.809** |
60
-
61
- 一些 Hunyuan3D 2.0 的生成结果:
62
- <p align="left">
63
- <img src="assets/images/e2e-1.gif" height=300>
64
- <img src="assets/images/e2e-2.gif" height=300>
65
- </p>
66
-
67
- ### 预训练模型
68
-
69
- | 模型名称 | 发布日期 | Huggingface |
70
- |----------------------|------------|--------------------------------------------------|
71
- | Hunyuan3D-DiT-v2-0 | 2025-01-21 | [下载](https://huggingface.co/tencent/Hunyuan3D-2) |
72
- | Hunyuan3D-Paint-v2-0 | 2025-01-21 | [下载](https://huggingface.co/tencent/Hunyuan3D-2) |
73
-
74
- ## 🤗快速入门 Hunyuan3D 2.0
75
-
76
- 你可以按照以下步骤,通过代码或 Gradio 来使用混元 3D 2.0。
77
-
78
- ### 依赖包安装
79
-
80
- 请通过官方网站安装 PyTorch。然后通过以下方式安装其他所需的依赖项。
81
-
82
- ```bash
83
- pip install -r assets/requirements.txt
84
- ```
85
-
86
- ### API 使用方法
87
-
88
- 我们设计了一个类似于 diffusers 的 API 来使用我们的几何生成模型 — 混元 3D-DiT 和纹理合成模型 — 混元 3D-Paint。
89
- 你可以通过以下方式使用 混元 3D-DiT:
90
-
91
- ```python
92
- from hy3dgen.shapegen import Hunyuan3DDiTFlowMatchingPipeline
93
-
94
- pipeline = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained('tencent/Hunyuan3D-2')
95
- mesh = pipeline(image='assets/demo.png')[0]
96
- ```
97
-
98
- 输出的网格是一个 Trimesh 对象,你可以将其保存为 glb/obj(或其他格式)文件。
99
- 对于 混元 3D-Paint,请执行以下操作:
100
-
101
- ```python
102
- from hy3dgen.texgen import Hunyuan3DPaintPipeline
103
- from hy3dgen.shapegen import Hunyuan3DDiTFlowMatchingPipeline
104
-
105
- # let's generate a mesh first
106
- pipeline = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained('tencent/Hunyuan3D-2')
107
- mesh = pipeline(image='assets/demo.png')[0]
108
-
109
- pipeline = Hunyuan3DPaintPipeline.from_pretrained('tencent/Hunyuan3D-2')
110
- mesh = pipeline(mesh, image='assets/demo.png')
111
- ```
112
-
113
- 请访问 [minimal_demo.py](minimal_demo.py) 以了解更多高级用法,例如 文本转 3D 以及 为手工制作的网格生成纹理。
114
-
115
- ### Gradio App 使用方法
116
-
117
- 你也可以通过以下方式在自己的计算机上托管一个Gradio应用程序:
118
-
119
- ```bash
120
- pip3 install gradio==3.39.0
121
- python3 gradio_app.py
122
- ```
123
-
124
- 如果你不想自己托管,别忘了访问[混元 3D](https://3d.hunyuan.tencent.com)进行快速使用。
125
-
126
- ## 📑 开源计划
127
-
128
- - [x] 推理代码
129
- - [x] 模型权重
130
- - [ ] 技术报告
131
- - [ ] ComfyUI
132
- - [ ] TensorRT 量化
133
-
134
- ## 🔗 引用
135
-
136
- 如果你发现我们的工作有帮助,你可以以下面的方式引用我们的报告:
137
-
138
- ```bibtex
139
- @misc{hunyuan3d22025tencent,
140
- title={Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation},
141
- author={Tencent Hunyuan3D Team},
142
- year={2025},
143
- }
144
- ```
145
-
146
- ## 致谢
147
-
148
- We would like to thank the contributors to
149
- the [DINOv2](https://github.com/facebookresearch/dinov2), [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), [FLUX](https://github.com/black-forest-labs/flux), [diffusers](https://github.com/huggingface/diffusers)
150
- and [HuggingFace](https://huggingface.co) repositories, for their open research and exploration.
151
-
152
- ## Star 历史
153
-
154
- <a href="https://star-history.com/#Tencent/Hunyuan3D-2&Date">
155
- <picture>
156
- <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Tencent/Hunyuan3D-2&type=Date&theme=dark" />
157
- <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Tencent/Hunyuan3D-2&type=Date" />
158
- <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Tencent/Hunyuan3D-2&type=Date" />
159
- </picture>
160
- </a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
__init__ (1).py DELETED
@@ -1,16 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
-
16
- from .pipelines import Hunyuan3DPaintPipeline, Hunyuan3DTexGenConfig
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
__init__ (10).py DELETED
@@ -1,15 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- from .hunyuan3ddit import Hunyuan3DDiT
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
__init__ (2).py DELETED
@@ -1,22 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- '''
16
- from .hierarchy import BuildHierarchy, BuildHierarchyWithColor
17
- from .io_obj import LoadObj, LoadObjWithTexture
18
- from .render import rasterize, interpolate
19
- '''
20
- from .io_glb import *
21
- from .io_obj import *
22
- from .render import *
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
__init__ (3).py DELETED
@@ -1,13 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
__init__ (4).py DELETED
@@ -1,13 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
__init__ (5).py DELETED
@@ -1,13 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
__init__ (6).py DELETED
@@ -1,13 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
__init__ (7).py DELETED
@@ -1,17 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- from .pipelines import Hunyuan3DDiTPipeline, Hunyuan3DDiTFlowMatchingPipeline
16
- from .postprocessors import FaceReducer, FloaterRemover, DegenerateFaceRemover, MeshSimplifier
17
- from .preprocessors import ImageProcessorV2, IMAGE_PROCESSORS, DEFAULT_IMAGEPROCESSOR
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
__init__ (8).py DELETED
@@ -1,28 +0,0 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0
2
- # and Other Licenses of the Third-Party Components therein:
3
- # The below Model in this distribution may have been modified by THL A29 Limited
4
- # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
-
6
- # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
- # The below software and/or models in this distribution may have been
8
- # modified by THL A29 Limited ("Tencent Modifications").
9
- # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
-
11
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
- # except for the third-party components listed below.
13
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
- # in the repsective licenses of these third-party components.
15
- # Users must comply with all terms and conditions of original licenses of these third-party
16
- # components and must ensure that the usage of the third party components adheres to
17
- # all relevant laws and regulations.
18
-
19
- # For avoidance of doubts, Hunyuan 3D means the large language models and
20
- # their software and algorithms, including trained model weights, parameters (including
21
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
- # fine-tuning enabling code and other elements of the foregoing made publicly available
23
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
-
25
-
26
- from .autoencoders import ShapeVAE
27
- from .conditioner import DualImageEncoder, SingleImageEncoder, DinoImageEncoder, CLIPImageEncoder
28
- from .denoisers import Hunyuan3DDiT
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
__init__ (9).py DELETED
@@ -1,20 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- from .attention_blocks import CrossAttentionDecoder
16
- from .attention_processors import FlashVDMCrossAttentionProcessor, CrossAttentionProcessor, \
17
- FlashVDMTopMCrossAttentionProcessor
18
- from .model import ShapeVAE, VectsetVAE
19
- from .surface_extractors import SurfaceExtractors, MCSurfaceExtractor, DMCSurfaceExtractor, Latent2MeshOutput
20
- from .volume_decoders import HierarchicalVolumeDecoding, FlashVDMVolumeDecoding, VanillaVolumeDecoder
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
__init__.py DELETED
@@ -1,13 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
alignImg4Tex_utils.py DELETED
@@ -1,121 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import torch
16
- from diffusers import EulerAncestralDiscreteScheduler
17
- from diffusers import StableDiffusionControlNetPipeline, StableDiffusionXLControlNetImg2ImgPipeline, ControlNetModel, \
18
- AutoencoderKL
19
-
20
-
21
- class Img2img_Control_Ip_adapter:
22
- def __init__(self, device):
23
- controlnet = ControlNetModel.from_pretrained('lllyasviel/control_v11f1p_sd15_depth', torch_dtype=torch.float16,
24
- variant="fp16", use_safetensors=True)
25
- pipe = StableDiffusionControlNetPipeline.from_pretrained(
26
- 'runwayml/stable-diffusion-v1-5', controlnet=controlnet, torch_dtype=torch.float16, use_safetensors=True
27
- )
28
- pipe.load_ip_adapter('h94/IP-Adapter', subfolder="models", weight_name="ip-adapter-plus_sd15.safetensors")
29
- pipe.set_ip_adapter_scale(0.7)
30
-
31
- pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
32
- # pipe.enable_model_cpu_offload()
33
- self.pipe = pipe.to(device)
34
-
35
- def __call__(
36
- self,
37
- prompt,
38
- control_image,
39
- ip_adapter_image,
40
- negative_prompt,
41
- height=512,
42
- width=512,
43
- num_inference_steps=20,
44
- guidance_scale=8.0,
45
- controlnet_conditioning_scale=1.0,
46
- output_type="pil",
47
- **kwargs,
48
- ):
49
- results = self.pipe(
50
- prompt=prompt,
51
- negative_prompt=negative_prompt,
52
- image=control_image,
53
- ip_adapter_image=ip_adapter_image,
54
- generator=torch.manual_seed(42),
55
- seed=42,
56
- num_inference_steps=num_inference_steps,
57
- guidance_scale=guidance_scale,
58
- controlnet_conditioning_scale=controlnet_conditioning_scale,
59
- strength=1,
60
- # clip_skip=2,
61
- height=height,
62
- width=width,
63
- output_type=output_type,
64
- **kwargs,
65
- ).images[0]
66
- return results
67
-
68
-
69
- ################################################################
70
-
71
- class HesModel:
72
- def __init__(self, ):
73
- controlnet_depth = ControlNetModel.from_pretrained(
74
- 'diffusers/controlnet-depth-sdxl-1.0',
75
- torch_dtype=torch.float16,
76
- variant="fp16",
77
- use_safetensors=True
78
- )
79
- self.pipe = StableDiffusionXLControlNetImg2ImgPipeline.from_pretrained(
80
- 'stabilityai/stable-diffusion-xl-base-1.0',
81
- torch_dtype=torch.float16,
82
- variant="fp16",
83
- controlnet=controlnet_depth,
84
- use_safetensors=True,
85
- )
86
- self.pipe.vae = AutoencoderKL.from_pretrained(
87
- 'madebyollin/sdxl-vae-fp16-fix',
88
- torch_dtype=torch.float16
89
- )
90
-
91
- self.pipe.load_ip_adapter('h94/IP-Adapter', subfolder="sdxl_models", weight_name="ip-adapter_sdxl.safetensors")
92
- self.pipe.set_ip_adapter_scale(0.7)
93
- self.pipe.to("cuda")
94
-
95
- def __call__(self,
96
- init_image,
97
- control_image,
98
- ip_adapter_image=None,
99
- prompt='3D image',
100
- negative_prompt='2D image',
101
- seed=42,
102
- strength=0.8,
103
- num_inference_steps=40,
104
- guidance_scale=7.5,
105
- controlnet_conditioning_scale=0.5,
106
- **kwargs
107
- ):
108
- image = self.pipe(
109
- prompt=prompt,
110
- image=init_image,
111
- control_image=control_image,
112
- ip_adapter_image=ip_adapter_image,
113
- negative_prompt=negative_prompt,
114
- num_inference_steps=num_inference_steps,
115
- guidance_scale=guidance_scale,
116
- strength=strength,
117
- controlnet_conditioning_scale=controlnet_conditioning_scale,
118
- seed=seed,
119
- **kwargs
120
- ).images[0]
121
- return image
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
attention_blocks.py DELETED
@@ -1,493 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
-
16
- import os
17
- from typing import Optional
18
-
19
- import torch
20
- import torch.nn as nn
21
- from einops import rearrange
22
-
23
- from .attention_processors import CrossAttentionProcessor
24
- from ...utils import logger
25
-
26
- scaled_dot_product_attention = nn.functional.scaled_dot_product_attention
27
-
28
- if os.environ.get('USE_SAGEATTN', '0') == '1':
29
- try:
30
- from sageattention import sageattn
31
- except ImportError:
32
- raise ImportError('Please install the package "sageattention" to use this USE_SAGEATTN.')
33
- scaled_dot_product_attention = sageattn
34
-
35
-
36
- class FourierEmbedder(nn.Module):
37
- """The sin/cosine positional embedding. Given an input tensor `x` of shape [n_batch, ..., c_dim], it converts
38
- each feature dimension of `x[..., i]` into:
39
- [
40
- sin(x[..., i]),
41
- sin(f_1*x[..., i]),
42
- sin(f_2*x[..., i]),
43
- ...
44
- sin(f_N * x[..., i]),
45
- cos(x[..., i]),
46
- cos(f_1*x[..., i]),
47
- cos(f_2*x[..., i]),
48
- ...
49
- cos(f_N * x[..., i]),
50
- x[..., i] # only present if include_input is True.
51
- ], here f_i is the frequency.
52
-
53
- Denote the space is [0 / num_freqs, 1 / num_freqs, 2 / num_freqs, 3 / num_freqs, ..., (num_freqs - 1) / num_freqs].
54
- If logspace is True, then the frequency f_i is [2^(0 / num_freqs), ..., 2^(i / num_freqs), ...];
55
- Otherwise, the frequencies are linearly spaced between [1.0, 2^(num_freqs - 1)].
56
-
57
- Args:
58
- num_freqs (int): the number of frequencies, default is 6;
59
- logspace (bool): If logspace is True, then the frequency f_i is [..., 2^(i / num_freqs), ...],
60
- otherwise, the frequencies are linearly spaced between [1.0, 2^(num_freqs - 1)];
61
- input_dim (int): the input dimension, default is 3;
62
- include_input (bool): include the input tensor or not, default is True.
63
-
64
- Attributes:
65
- frequencies (torch.Tensor): If logspace is True, then the frequency f_i is [..., 2^(i / num_freqs), ...],
66
- otherwise, the frequencies are linearly spaced between [1.0, 2^(num_freqs - 1);
67
-
68
- out_dim (int): the embedding size, if include_input is True, it is input_dim * (num_freqs * 2 + 1),
69
- otherwise, it is input_dim * num_freqs * 2.
70
-
71
- """
72
-
73
- def __init__(self,
74
- num_freqs: int = 6,
75
- logspace: bool = True,
76
- input_dim: int = 3,
77
- include_input: bool = True,
78
- include_pi: bool = True) -> None:
79
-
80
- """The initialization"""
81
-
82
- super().__init__()
83
-
84
- if logspace:
85
- frequencies = 2.0 ** torch.arange(
86
- num_freqs,
87
- dtype=torch.float32
88
- )
89
- else:
90
- frequencies = torch.linspace(
91
- 1.0,
92
- 2.0 ** (num_freqs - 1),
93
- num_freqs,
94
- dtype=torch.float32
95
- )
96
-
97
- if include_pi:
98
- frequencies *= torch.pi
99
-
100
- self.register_buffer("frequencies", frequencies, persistent=False)
101
- self.include_input = include_input
102
- self.num_freqs = num_freqs
103
-
104
- self.out_dim = self.get_dims(input_dim)
105
-
106
- def get_dims(self, input_dim):
107
- temp = 1 if self.include_input or self.num_freqs == 0 else 0
108
- out_dim = input_dim * (self.num_freqs * 2 + temp)
109
-
110
- return out_dim
111
-
112
- def forward(self, x: torch.Tensor) -> torch.Tensor:
113
- """ Forward process.
114
-
115
- Args:
116
- x: tensor of shape [..., dim]
117
-
118
- Returns:
119
- embedding: an embedding of `x` of shape [..., dim * (num_freqs * 2 + temp)]
120
- where temp is 1 if include_input is True and 0 otherwise.
121
- """
122
-
123
- if self.num_freqs > 0:
124
- embed = (x[..., None].contiguous() * self.frequencies).view(*x.shape[:-1], -1)
125
- if self.include_input:
126
- return torch.cat((x, embed.sin(), embed.cos()), dim=-1)
127
- else:
128
- return torch.cat((embed.sin(), embed.cos()), dim=-1)
129
- else:
130
- return x
131
-
132
-
133
- class DropPath(nn.Module):
134
- """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
135
- """
136
-
137
- def __init__(self, drop_prob: float = 0., scale_by_keep: bool = True):
138
- super(DropPath, self).__init__()
139
- self.drop_prob = drop_prob
140
- self.scale_by_keep = scale_by_keep
141
-
142
- def forward(self, x):
143
- """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
144
-
145
- This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,
146
- the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
147
- See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for
148
- changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use
149
- 'survival rate' as the argument.
150
-
151
- """
152
- if self.drop_prob == 0. or not self.training:
153
- return x
154
- keep_prob = 1 - self.drop_prob
155
- shape = (x.shape[0],) + (1,) * (x.ndim - 1) # work with diff dim tensors, not just 2D ConvNets
156
- random_tensor = x.new_empty(shape).bernoulli_(keep_prob)
157
- if keep_prob > 0.0 and self.scale_by_keep:
158
- random_tensor.div_(keep_prob)
159
- return x * random_tensor
160
-
161
- def extra_repr(self):
162
- return f'drop_prob={round(self.drop_prob, 3):0.3f}'
163
-
164
-
165
- class MLP(nn.Module):
166
- def __init__(
167
- self, *,
168
- width: int,
169
- expand_ratio: int = 4,
170
- output_width: int = None,
171
- drop_path_rate: float = 0.0
172
- ):
173
- super().__init__()
174
- self.width = width
175
- self.c_fc = nn.Linear(width, width * expand_ratio)
176
- self.c_proj = nn.Linear(width * expand_ratio, output_width if output_width is not None else width)
177
- self.gelu = nn.GELU()
178
- self.drop_path = DropPath(drop_path_rate) if drop_path_rate > 0. else nn.Identity()
179
-
180
- def forward(self, x):
181
- return self.drop_path(self.c_proj(self.gelu(self.c_fc(x))))
182
-
183
-
184
- class QKVMultiheadCrossAttention(nn.Module):
185
- def __init__(
186
- self,
187
- *,
188
- heads: int,
189
- n_data: Optional[int] = None,
190
- width=None,
191
- qk_norm=False,
192
- norm_layer=nn.LayerNorm
193
- ):
194
- super().__init__()
195
- self.heads = heads
196
- self.n_data = n_data
197
- self.q_norm = norm_layer(width // heads, elementwise_affine=True, eps=1e-6) if qk_norm else nn.Identity()
198
- self.k_norm = norm_layer(width // heads, elementwise_affine=True, eps=1e-6) if qk_norm else nn.Identity()
199
-
200
- self.attn_processor = CrossAttentionProcessor()
201
-
202
- def forward(self, q, kv):
203
- _, n_ctx, _ = q.shape
204
- bs, n_data, width = kv.shape
205
- attn_ch = width // self.heads // 2
206
- q = q.view(bs, n_ctx, self.heads, -1)
207
- kv = kv.view(bs, n_data, self.heads, -1)
208
- k, v = torch.split(kv, attn_ch, dim=-1)
209
-
210
- q = self.q_norm(q)
211
- k = self.k_norm(k)
212
- q, k, v = map(lambda t: rearrange(t, 'b n h d -> b h n d', h=self.heads), (q, k, v))
213
- out = self.attn_processor(self, q, k, v)
214
- out = out.transpose(1, 2).reshape(bs, n_ctx, -1)
215
- return out
216
-
217
-
218
- class MultiheadCrossAttention(nn.Module):
219
- def __init__(
220
- self,
221
- *,
222
- width: int,
223
- heads: int,
224
- qkv_bias: bool = True,
225
- n_data: Optional[int] = None,
226
- data_width: Optional[int] = None,
227
- norm_layer=nn.LayerNorm,
228
- qk_norm: bool = False,
229
- kv_cache: bool = False,
230
- ):
231
- super().__init__()
232
- self.n_data = n_data
233
- self.width = width
234
- self.heads = heads
235
- self.data_width = width if data_width is None else data_width
236
- self.c_q = nn.Linear(width, width, bias=qkv_bias)
237
- self.c_kv = nn.Linear(self.data_width, width * 2, bias=qkv_bias)
238
- self.c_proj = nn.Linear(width, width)
239
- self.attention = QKVMultiheadCrossAttention(
240
- heads=heads,
241
- n_data=n_data,
242
- width=width,
243
- norm_layer=norm_layer,
244
- qk_norm=qk_norm
245
- )
246
- self.kv_cache = kv_cache
247
- self.data = None
248
-
249
- def forward(self, x, data):
250
- x = self.c_q(x)
251
- if self.kv_cache:
252
- if self.data is None:
253
- self.data = self.c_kv(data)
254
- logger.info('Save kv cache,this should be called only once for one mesh')
255
- data = self.data
256
- else:
257
- data = self.c_kv(data)
258
- x = self.attention(x, data)
259
- x = self.c_proj(x)
260
- return x
261
-
262
-
263
- class ResidualCrossAttentionBlock(nn.Module):
264
- def __init__(
265
- self,
266
- *,
267
- n_data: Optional[int] = None,
268
- width: int,
269
- heads: int,
270
- mlp_expand_ratio: int = 4,
271
- data_width: Optional[int] = None,
272
- qkv_bias: bool = True,
273
- norm_layer=nn.LayerNorm,
274
- qk_norm: bool = False
275
- ):
276
- super().__init__()
277
-
278
- if data_width is None:
279
- data_width = width
280
-
281
- self.attn = MultiheadCrossAttention(
282
- n_data=n_data,
283
- width=width,
284
- heads=heads,
285
- data_width=data_width,
286
- qkv_bias=qkv_bias,
287
- norm_layer=norm_layer,
288
- qk_norm=qk_norm
289
- )
290
- self.ln_1 = norm_layer(width, elementwise_affine=True, eps=1e-6)
291
- self.ln_2 = norm_layer(data_width, elementwise_affine=True, eps=1e-6)
292
- self.ln_3 = norm_layer(width, elementwise_affine=True, eps=1e-6)
293
- self.mlp = MLP(width=width, expand_ratio=mlp_expand_ratio)
294
-
295
- def forward(self, x: torch.Tensor, data: torch.Tensor):
296
- x = x + self.attn(self.ln_1(x), self.ln_2(data))
297
- x = x + self.mlp(self.ln_3(x))
298
- return x
299
-
300
-
301
- class QKVMultiheadAttention(nn.Module):
302
- def __init__(
303
- self,
304
- *,
305
- heads: int,
306
- n_ctx: int,
307
- width=None,
308
- qk_norm=False,
309
- norm_layer=nn.LayerNorm
310
- ):
311
- super().__init__()
312
- self.heads = heads
313
- self.n_ctx = n_ctx
314
- self.q_norm = norm_layer(width // heads, elementwise_affine=True, eps=1e-6) if qk_norm else nn.Identity()
315
- self.k_norm = norm_layer(width // heads, elementwise_affine=True, eps=1e-6) if qk_norm else nn.Identity()
316
-
317
- def forward(self, qkv):
318
- bs, n_ctx, width = qkv.shape
319
- attn_ch = width // self.heads // 3
320
- qkv = qkv.view(bs, n_ctx, self.heads, -1)
321
- q, k, v = torch.split(qkv, attn_ch, dim=-1)
322
-
323
- q = self.q_norm(q)
324
- k = self.k_norm(k)
325
-
326
- q, k, v = map(lambda t: rearrange(t, 'b n h d -> b h n d', h=self.heads), (q, k, v))
327
- out = scaled_dot_product_attention(q, k, v).transpose(1, 2).reshape(bs, n_ctx, -1)
328
- return out
329
-
330
-
331
- class MultiheadAttention(nn.Module):
332
- def __init__(
333
- self,
334
- *,
335
- n_ctx: int,
336
- width: int,
337
- heads: int,
338
- qkv_bias: bool,
339
- norm_layer=nn.LayerNorm,
340
- qk_norm: bool = False,
341
- drop_path_rate: float = 0.0
342
- ):
343
- super().__init__()
344
- self.n_ctx = n_ctx
345
- self.width = width
346
- self.heads = heads
347
- self.c_qkv = nn.Linear(width, width * 3, bias=qkv_bias)
348
- self.c_proj = nn.Linear(width, width)
349
- self.attention = QKVMultiheadAttention(
350
- heads=heads,
351
- n_ctx=n_ctx,
352
- width=width,
353
- norm_layer=norm_layer,
354
- qk_norm=qk_norm
355
- )
356
- self.drop_path = DropPath(drop_path_rate) if drop_path_rate > 0. else nn.Identity()
357
-
358
- def forward(self, x):
359
- x = self.c_qkv(x)
360
- x = self.attention(x)
361
- x = self.drop_path(self.c_proj(x))
362
- return x
363
-
364
-
365
- class ResidualAttentionBlock(nn.Module):
366
- def __init__(
367
- self,
368
- *,
369
- n_ctx: int,
370
- width: int,
371
- heads: int,
372
- qkv_bias: bool = True,
373
- norm_layer=nn.LayerNorm,
374
- qk_norm: bool = False,
375
- drop_path_rate: float = 0.0,
376
- ):
377
- super().__init__()
378
- self.attn = MultiheadAttention(
379
- n_ctx=n_ctx,
380
- width=width,
381
- heads=heads,
382
- qkv_bias=qkv_bias,
383
- norm_layer=norm_layer,
384
- qk_norm=qk_norm,
385
- drop_path_rate=drop_path_rate
386
- )
387
- self.ln_1 = norm_layer(width, elementwise_affine=True, eps=1e-6)
388
- self.mlp = MLP(width=width, drop_path_rate=drop_path_rate)
389
- self.ln_2 = norm_layer(width, elementwise_affine=True, eps=1e-6)
390
-
391
- def forward(self, x: torch.Tensor):
392
- x = x + self.attn(self.ln_1(x))
393
- x = x + self.mlp(self.ln_2(x))
394
- return x
395
-
396
-
397
- class Transformer(nn.Module):
398
- def __init__(
399
- self,
400
- *,
401
- n_ctx: int,
402
- width: int,
403
- layers: int,
404
- heads: int,
405
- qkv_bias: bool = True,
406
- norm_layer=nn.LayerNorm,
407
- qk_norm: bool = False,
408
- drop_path_rate: float = 0.0
409
- ):
410
- super().__init__()
411
- self.n_ctx = n_ctx
412
- self.width = width
413
- self.layers = layers
414
- self.resblocks = nn.ModuleList(
415
- [
416
- ResidualAttentionBlock(
417
- n_ctx=n_ctx,
418
- width=width,
419
- heads=heads,
420
- qkv_bias=qkv_bias,
421
- norm_layer=norm_layer,
422
- qk_norm=qk_norm,
423
- drop_path_rate=drop_path_rate
424
- )
425
- for _ in range(layers)
426
- ]
427
- )
428
-
429
- def forward(self, x: torch.Tensor):
430
- for block in self.resblocks:
431
- x = block(x)
432
- return x
433
-
434
-
435
- class CrossAttentionDecoder(nn.Module):
436
-
437
- def __init__(
438
- self,
439
- *,
440
- num_latents: int,
441
- out_channels: int,
442
- fourier_embedder: FourierEmbedder,
443
- width: int,
444
- heads: int,
445
- mlp_expand_ratio: int = 4,
446
- downsample_ratio: int = 1,
447
- enable_ln_post: bool = True,
448
- qkv_bias: bool = True,
449
- qk_norm: bool = False,
450
- label_type: str = "binary"
451
- ):
452
- super().__init__()
453
-
454
- self.enable_ln_post = enable_ln_post
455
- self.fourier_embedder = fourier_embedder
456
- self.downsample_ratio = downsample_ratio
457
- self.query_proj = nn.Linear(self.fourier_embedder.out_dim, width)
458
- if self.downsample_ratio != 1:
459
- self.latents_proj = nn.Linear(width * downsample_ratio, width)
460
- if self.enable_ln_post == False:
461
- qk_norm = False
462
- self.cross_attn_decoder = ResidualCrossAttentionBlock(
463
- n_data=num_latents,
464
- width=width,
465
- mlp_expand_ratio=mlp_expand_ratio,
466
- heads=heads,
467
- qkv_bias=qkv_bias,
468
- qk_norm=qk_norm
469
- )
470
-
471
- if self.enable_ln_post:
472
- self.ln_post = nn.LayerNorm(width)
473
- self.output_proj = nn.Linear(width, out_channels)
474
- self.label_type = label_type
475
- self.count = 0
476
-
477
- def set_cross_attention_processor(self, processor):
478
- self.cross_attn_decoder.attn.attention.attn_processor = processor
479
-
480
- def set_default_cross_attention_processor(self):
481
- self.cross_attn_decoder.attn.attention.attn_processor = CrossAttentionProcessor
482
-
483
- def forward(self, queries=None, query_embeddings=None, latents=None):
484
- if query_embeddings is None:
485
- query_embeddings = self.query_proj(self.fourier_embedder(queries).to(latents.dtype))
486
- self.count += query_embeddings.shape[1]
487
- if self.downsample_ratio != 1:
488
- latents = self.latents_proj(latents)
489
- x = self.cross_attn_decoder(query_embeddings, latents)
490
- if self.enable_ln_post:
491
- x = self.ln_post(x)
492
- occ = self.output_proj(x)
493
- return occ
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
attention_processors.py DELETED
@@ -1,96 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import os
16
-
17
- import torch
18
- import torch.nn.functional as F
19
-
20
- scaled_dot_product_attention = F.scaled_dot_product_attention
21
- if os.environ.get('CA_USE_SAGEATTN', '0') == '1':
22
- try:
23
- from sageattention import sageattn
24
- except ImportError:
25
- raise ImportError('Please install the package "sageattention" to use this USE_SAGEATTN.')
26
- scaled_dot_product_attention = sageattn
27
-
28
-
29
- class CrossAttentionProcessor:
30
- def __call__(self, attn, q, k, v):
31
- out = scaled_dot_product_attention(q, k, v)
32
- return out
33
-
34
-
35
- class FlashVDMCrossAttentionProcessor:
36
- def __init__(self, topk=None):
37
- self.topk = topk
38
-
39
- def __call__(self, attn, q, k, v):
40
- if k.shape[-2] == 3072:
41
- topk = 1024
42
- elif k.shape[-2] == 512:
43
- topk = 256
44
- else:
45
- topk = k.shape[-2] // 3
46
-
47
- if self.topk is True:
48
- q1 = q[:, :, ::100, :]
49
- sim = q1 @ k.transpose(-1, -2)
50
- sim = torch.mean(sim, -2)
51
- topk_ind = torch.topk(sim, dim=-1, k=topk).indices.squeeze(-2).unsqueeze(-1)
52
- topk_ind = topk_ind.expand(-1, -1, -1, v.shape[-1])
53
- v0 = torch.gather(v, dim=-2, index=topk_ind)
54
- k0 = torch.gather(k, dim=-2, index=topk_ind)
55
- out = scaled_dot_product_attention(q, k0, v0)
56
- elif self.topk is False:
57
- out = scaled_dot_product_attention(q, k, v)
58
- else:
59
- idx, counts = self.topk
60
- start = 0
61
- outs = []
62
- for grid_coord, count in zip(idx, counts):
63
- end = start + count
64
- q_chunk = q[:, :, start:end, :]
65
- k0, v0 = self.select_topkv(q_chunk, k, v, topk)
66
- out = scaled_dot_product_attention(q_chunk, k0, v0)
67
- outs.append(out)
68
- start += count
69
- out = torch.cat(outs, dim=-2)
70
- self.topk = False
71
- return out
72
-
73
- def select_topkv(self, q_chunk, k, v, topk):
74
- q1 = q_chunk[:, :, ::50, :]
75
- sim = q1 @ k.transpose(-1, -2)
76
- sim = torch.mean(sim, -2)
77
- topk_ind = torch.topk(sim, dim=-1, k=topk).indices.squeeze(-2).unsqueeze(-1)
78
- topk_ind = topk_ind.expand(-1, -1, -1, v.shape[-1])
79
- v0 = torch.gather(v, dim=-2, index=topk_ind)
80
- k0 = torch.gather(k, dim=-2, index=topk_ind)
81
- return k0, v0
82
-
83
-
84
- class FlashVDMTopMCrossAttentionProcessor(FlashVDMCrossAttentionProcessor):
85
- def select_topkv(self, q_chunk, k, v, topk):
86
- q1 = q_chunk[:, :, ::30, :]
87
- sim = q1 @ k.transpose(-1, -2)
88
- # sim = sim.to(torch.float32)
89
- sim = sim.softmax(-1)
90
- sim = torch.mean(sim, 1)
91
- activated_token = torch.where(sim > 1e-6)[2]
92
- index = torch.unique(activated_token, return_counts=True)[0].unsqueeze(0).unsqueeze(0).unsqueeze(-1)
93
- index = index.expand(-1, v.shape[1], -1, v.shape[-1])
94
- v0 = torch.gather(v, dim=-2, index=index)
95
- k0 = torch.gather(k, dim=-2, index=index)
96
- return k0, v0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
camera_utils.py DELETED
@@ -1,106 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import math
16
-
17
- import numpy as np
18
- import torch
19
-
20
-
21
- def transform_pos(mtx, pos, keepdim=False):
22
- t_mtx = torch.from_numpy(mtx).to(
23
- pos.device) if isinstance(
24
- mtx, np.ndarray) else mtx
25
- if pos.shape[-1] == 3:
26
- posw = torch.cat(
27
- [pos, torch.ones([pos.shape[0], 1]).to(pos.device)], axis=1)
28
- else:
29
- posw = pos
30
-
31
- if keepdim:
32
- return torch.matmul(posw, t_mtx.t())[...]
33
- else:
34
- return torch.matmul(posw, t_mtx.t())[None, ...]
35
-
36
-
37
- def get_mv_matrix(elev, azim, camera_distance, center=None):
38
- elev = -elev
39
- azim += 90
40
-
41
- elev_rad = math.radians(elev)
42
- azim_rad = math.radians(azim)
43
-
44
- camera_position = np.array([camera_distance * math.cos(elev_rad) * math.cos(azim_rad),
45
- camera_distance *
46
- math.cos(elev_rad) * math.sin(azim_rad),
47
- camera_distance * math.sin(elev_rad)])
48
-
49
- if center is None:
50
- center = np.array([0, 0, 0])
51
- else:
52
- center = np.array(center)
53
-
54
- lookat = center - camera_position
55
- lookat = lookat / np.linalg.norm(lookat)
56
-
57
- up = np.array([0, 0, 1.0])
58
- right = np.cross(lookat, up)
59
- right = right / np.linalg.norm(right)
60
- up = np.cross(right, lookat)
61
- up = up / np.linalg.norm(up)
62
-
63
- c2w = np.concatenate(
64
- [np.stack([right, up, -lookat], axis=-1), camera_position[:, None]], axis=-1)
65
-
66
- w2c = np.zeros((4, 4))
67
- w2c[:3, :3] = np.transpose(c2w[:3, :3], (1, 0))
68
- w2c[:3, 3:] = -np.matmul(np.transpose(c2w[:3, :3], (1, 0)), c2w[:3, 3:])
69
- w2c[3, 3] = 1.0
70
-
71
- return w2c.astype(np.float32)
72
-
73
-
74
- def get_orthographic_projection_matrix(
75
- left=-1, right=1, bottom=-1, top=1, near=0, far=2):
76
- """
77
- 计算正交投影矩阵。
78
-
79
- 参数:
80
- left (float): 投影区域左侧边界。
81
- right (float): 投影区域右侧边界。
82
- bottom (float): 投影区域底部边界。
83
- top (float): 投影区域顶部边界。
84
- near (float): 投影区域近裁剪面距离。
85
- far (float): 投影区域远裁剪面距离。
86
-
87
- 返回:
88
- numpy.ndarray: 正交投影矩阵。
89
- """
90
- ortho_matrix = np.eye(4, dtype=np.float32)
91
- ortho_matrix[0, 0] = 2 / (right - left)
92
- ortho_matrix[1, 1] = 2 / (top - bottom)
93
- ortho_matrix[2, 2] = -2 / (far - near)
94
- ortho_matrix[0, 3] = -(right + left) / (right - left)
95
- ortho_matrix[1, 3] = -(top + bottom) / (top - bottom)
96
- ortho_matrix[2, 3] = -(far + near) / (far - near)
97
- return ortho_matrix
98
-
99
-
100
- def get_perspective_projection_matrix(fovy, aspect_wh, near, far):
101
- fovy_rad = math.radians(fovy)
102
- return np.array([[1.0 / (math.tan(fovy_rad / 2.0) * aspect_wh), 0, 0, 0],
103
- [0, 1.0 / math.tan(fovy_rad / 2.0), 0, 0],
104
- [0, 0, -(far + near) / (far - near), -
105
- 2.0 * far * near / (far - near)],
106
- [0, 0, -1, 0]]).astype(np.float32)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
compile_mesh_painter.bat DELETED
@@ -1,3 +0,0 @@
1
- FOR /F "tokens=*" %%i IN ('python -m pybind11 --includes') DO SET PYINCLUDES=%%i
2
- echo %PYINCLUDES%
3
- g++ -O3 -Wall -shared -std=c++11 -fPIC %PYINCLUDES% mesh_processor.cpp -o mesh_processor.pyd -lpython3.12
 
 
 
 
conditioner.py DELETED
@@ -1,257 +0,0 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0
2
- # and Other Licenses of the Third-Party Components therein:
3
- # The below Model in this distribution may have been modified by THL A29 Limited
4
- # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
-
6
- # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
- # The below software and/or models in this distribution may have been
8
- # modified by THL A29 Limited ("Tencent Modifications").
9
- # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
-
11
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
- # except for the third-party components listed below.
13
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
- # in the repsective licenses of these third-party components.
15
- # Users must comply with all terms and conditions of original licenses of these third-party
16
- # components and must ensure that the usage of the third party components adheres to
17
- # all relevant laws and regulations.
18
-
19
- # For avoidance of doubts, Hunyuan 3D means the large language models and
20
- # their software and algorithms, including trained model weights, parameters (including
21
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
- # fine-tuning enabling code and other elements of the foregoing made publicly available
23
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
-
25
- import numpy as np
26
- import torch
27
- import torch.nn as nn
28
- from torchvision import transforms
29
- from transformers import (
30
- CLIPVisionModelWithProjection,
31
- CLIPVisionConfig,
32
- Dinov2Model,
33
- Dinov2Config,
34
- )
35
-
36
-
37
- def get_1d_sincos_pos_embed_from_grid(embed_dim, pos):
38
- """
39
- embed_dim: output dimension for each position
40
- pos: a list of positions to be encoded: size (M,)
41
- out: (M, D)
42
- """
43
- assert embed_dim % 2 == 0
44
- omega = np.arange(embed_dim // 2, dtype=np.float64)
45
- omega /= embed_dim / 2.
46
- omega = 1. / 10000 ** omega # (D/2,)
47
-
48
- pos = pos.reshape(-1) # (M,)
49
- out = np.einsum('m,d->md', pos, omega) # (M, D/2), outer product
50
-
51
- emb_sin = np.sin(out) # (M, D/2)
52
- emb_cos = np.cos(out) # (M, D/2)
53
-
54
- return np.concatenate([emb_sin, emb_cos], axis=1)
55
-
56
-
57
- class ImageEncoder(nn.Module):
58
- def __init__(
59
- self,
60
- version=None,
61
- config=None,
62
- use_cls_token=True,
63
- image_size=224,
64
- **kwargs,
65
- ):
66
- super().__init__()
67
-
68
- if config is None:
69
- self.model = self.MODEL_CLASS.from_pretrained(version)
70
- else:
71
- self.model = self.MODEL_CLASS(self.MODEL_CONFIG_CLASS.from_dict(config))
72
- self.model.eval()
73
- self.model.requires_grad_(False)
74
- self.use_cls_token = use_cls_token
75
- self.size = image_size // 14
76
- self.num_patches = (image_size // 14) ** 2
77
- if self.use_cls_token:
78
- self.num_patches += 1
79
-
80
- self.transform = transforms.Compose(
81
- [
82
- transforms.Resize(image_size, transforms.InterpolationMode.BILINEAR, antialias=True),
83
- transforms.CenterCrop(image_size),
84
- transforms.Normalize(
85
- mean=self.mean,
86
- std=self.std,
87
- ),
88
- ]
89
- )
90
-
91
- def forward(self, image, mask=None, value_range=(-1, 1), **kwargs):
92
- if value_range is not None:
93
- low, high = value_range
94
- image = (image - low) / (high - low)
95
-
96
- image = image.to(self.model.device, dtype=self.model.dtype)
97
- inputs = self.transform(image)
98
- outputs = self.model(inputs)
99
-
100
- last_hidden_state = outputs.last_hidden_state
101
- if not self.use_cls_token:
102
- last_hidden_state = last_hidden_state[:, 1:, :]
103
-
104
- return last_hidden_state
105
-
106
- def unconditional_embedding(self, batch_size, **kwargs):
107
- device = next(self.model.parameters()).device
108
- dtype = next(self.model.parameters()).dtype
109
- zero = torch.zeros(
110
- batch_size,
111
- self.num_patches,
112
- self.model.config.hidden_size,
113
- device=device,
114
- dtype=dtype,
115
- )
116
-
117
- return zero
118
-
119
-
120
- class CLIPImageEncoder(ImageEncoder):
121
- MODEL_CLASS = CLIPVisionModelWithProjection
122
- MODEL_CONFIG_CLASS = CLIPVisionConfig
123
- mean = [0.48145466, 0.4578275, 0.40821073]
124
- std = [0.26862954, 0.26130258, 0.27577711]
125
-
126
-
127
- class DinoImageEncoder(ImageEncoder):
128
- MODEL_CLASS = Dinov2Model
129
- MODEL_CONFIG_CLASS = Dinov2Config
130
- mean = [0.485, 0.456, 0.406]
131
- std = [0.229, 0.224, 0.225]
132
-
133
-
134
- class DinoImageEncoderMV(DinoImageEncoder):
135
- def __init__(
136
- self,
137
- version=None,
138
- config=None,
139
- use_cls_token=True,
140
- image_size=224,
141
- view_num=4,
142
- **kwargs,
143
- ):
144
- super().__init__(version, config, use_cls_token, image_size, **kwargs)
145
- self.view_num = view_num
146
- self.num_patches = self.num_patches
147
- pos = np.arange(self.view_num, dtype=np.float32)
148
- view_embedding = torch.from_numpy(
149
- get_1d_sincos_pos_embed_from_grid(self.model.config.hidden_size, pos)).float()
150
-
151
- view_embedding = view_embedding.unsqueeze(1).repeat(1, self.num_patches, 1)
152
- self.view_embed = view_embedding.unsqueeze(0)
153
-
154
- def forward(self, image, mask=None, value_range=(-1, 1), view_idxs=None):
155
- if value_range is not None:
156
- low, high = value_range
157
- image = (image - low) / (high - low)
158
-
159
- image = image.to(self.model.device, dtype=self.model.dtype)
160
-
161
- bs, num_views, c, h, w = image.shape
162
- image = image.view(bs * num_views, c, h, w)
163
-
164
- inputs = self.transform(image)
165
- outputs = self.model(inputs)
166
-
167
- last_hidden_state = outputs.last_hidden_state
168
- last_hidden_state = last_hidden_state.view(
169
- bs, num_views, last_hidden_state.shape[-2],
170
- last_hidden_state.shape[-1]
171
- )
172
-
173
- view_embedding = self.view_embed.to(last_hidden_state.dtype).to(last_hidden_state.device)
174
- if view_idxs is not None:
175
- assert len(view_idxs) == bs
176
- view_embeddings = []
177
- for i in range(bs):
178
- view_idx = view_idxs[i]
179
- assert num_views == len(view_idx)
180
- view_embeddings.append(self.view_embed[:, view_idx, ...])
181
- view_embedding = torch.cat(view_embeddings, 0).to(last_hidden_state.dtype).to(last_hidden_state.device)
182
-
183
- if num_views != self.view_num:
184
- view_embedding = view_embedding[:, :num_views, ...]
185
- last_hidden_state = last_hidden_state + view_embedding
186
- last_hidden_state = last_hidden_state.view(bs, num_views * last_hidden_state.shape[-2],
187
- last_hidden_state.shape[-1])
188
- return last_hidden_state
189
-
190
- def unconditional_embedding(self, batch_size, view_idxs=None, **kwargs):
191
- device = next(self.model.parameters()).device
192
- dtype = next(self.model.parameters()).dtype
193
- zero = torch.zeros(
194
- batch_size,
195
- self.num_patches * len(view_idxs[0]),
196
- self.model.config.hidden_size,
197
- device=device,
198
- dtype=dtype,
199
- )
200
- return zero
201
-
202
-
203
- def build_image_encoder(config):
204
- if config['type'] == 'CLIPImageEncoder':
205
- return CLIPImageEncoder(**config['kwargs'])
206
- elif config['type'] == 'DinoImageEncoder':
207
- return DinoImageEncoder(**config['kwargs'])
208
- elif config['type'] == 'DinoImageEncoderMV':
209
- return DinoImageEncoderMV(**config['kwargs'])
210
- else:
211
- raise ValueError(f'Unknown image encoder type: {config["type"]}')
212
-
213
-
214
- class DualImageEncoder(nn.Module):
215
- def __init__(
216
- self,
217
- main_image_encoder,
218
- additional_image_encoder,
219
- ):
220
- super().__init__()
221
- self.main_image_encoder = build_image_encoder(main_image_encoder)
222
- self.additional_image_encoder = build_image_encoder(additional_image_encoder)
223
-
224
- def forward(self, image, mask=None, **kwargs):
225
- outputs = {
226
- 'main': self.main_image_encoder(image, mask=mask, **kwargs),
227
- 'additional': self.additional_image_encoder(image, mask=mask, **kwargs),
228
- }
229
- return outputs
230
-
231
- def unconditional_embedding(self, batch_size, **kwargs):
232
- outputs = {
233
- 'main': self.main_image_encoder.unconditional_embedding(batch_size, **kwargs),
234
- 'additional': self.additional_image_encoder.unconditional_embedding(batch_size, **kwargs),
235
- }
236
- return outputs
237
-
238
-
239
- class SingleImageEncoder(nn.Module):
240
- def __init__(
241
- self,
242
- main_image_encoder,
243
- ):
244
- super().__init__()
245
- self.main_image_encoder = build_image_encoder(main_image_encoder)
246
-
247
- def forward(self, image, mask=None, **kwargs):
248
- outputs = {
249
- 'main': self.main_image_encoder(image, mask=mask, **kwargs),
250
- }
251
- return outputs
252
-
253
- def unconditional_embedding(self, batch_size, **kwargs):
254
- outputs = {
255
- 'main': self.main_image_encoder.unconditional_embedding(batch_size, **kwargs),
256
- }
257
- return outputs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
counter_utils.py DELETED
@@ -1,48 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
-
16
- class RunningStats():
17
- def __init__(self) -> None:
18
- self.count = 0
19
- self.sum = 0
20
- self.mean = 0
21
- self.min = None
22
- self.max = None
23
-
24
- def add_value(self, value):
25
- self.count += 1
26
- self.sum += value
27
- self.mean = self.sum / self.count
28
-
29
- if self.min is None or value < self.min:
30
- self.min = value
31
-
32
- if self.max is None or value > self.max:
33
- self.max = value
34
-
35
- def get_count(self):
36
- return self.count
37
-
38
- def get_sum(self):
39
- return self.sum
40
-
41
- def get_mean(self):
42
- return self.mean
43
-
44
- def get_min(self):
45
- return self.min
46
-
47
- def get_max(self):
48
- return self.max
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
custom_rasterizer-0.1-cp310-cp310-linux_x86_64.whl DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:1dc5bea62f7ef924b9f58722b9f7634501b05af2b9507e736c256d6b2b9d90fc
3
- size 4674364
 
 
 
 
dehighlight_utils.py DELETED
@@ -1,107 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import cv2
16
- import numpy as np
17
- import torch
18
- from PIL import Image
19
- from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler
20
-
21
-
22
- class Light_Shadow_Remover():
23
- def __init__(self, config):
24
- self.device = config.device
25
- self.cfg_image = 1.5
26
- self.cfg_text = 1.0
27
-
28
- pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained(
29
- config.light_remover_ckpt_path,
30
- torch_dtype=torch.float16,
31
- safety_checker=None,
32
- )
33
- pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)
34
- pipeline.set_progress_bar_config(disable=True)
35
-
36
- self.pipeline = pipeline.to(self.device, torch.float16)
37
-
38
- def recorrect_rgb(self, src_image, target_image, alpha_channel, scale=0.95):
39
-
40
- def flat_and_mask(bgr, a):
41
- mask = torch.where(a > 0.5, True, False)
42
- bgr_flat = bgr.reshape(-1, bgr.shape[-1])
43
- mask_flat = mask.reshape(-1)
44
- bgr_flat_masked = bgr_flat[mask_flat, :]
45
- return bgr_flat_masked
46
-
47
- src_flat = flat_and_mask(src_image, alpha_channel)
48
- target_flat = flat_and_mask(target_image, alpha_channel)
49
- corrected_bgr = torch.zeros_like(src_image)
50
-
51
- for i in range(3):
52
- src_mean, src_stddev = torch.mean(src_flat[:, i]), torch.std(src_flat[:, i])
53
- target_mean, target_stddev = torch.mean(target_flat[:, i]), torch.std(target_flat[:, i])
54
- corrected_bgr[:, :, i] = torch.clamp((src_image[:, :, i] - scale * src_mean) * (target_stddev / src_stddev) + scale * target_mean, 0, 1)
55
-
56
- src_mse = torch.mean((src_image - target_image) ** 2)
57
- modify_mse = torch.mean((corrected_bgr - target_image) ** 2)
58
- if src_mse < modify_mse:
59
- corrected_bgr = torch.cat([src_image, alpha_channel], dim=-1)
60
- else:
61
- corrected_bgr = torch.cat([corrected_bgr, alpha_channel], dim=-1)
62
-
63
- return corrected_bgr
64
-
65
- @torch.no_grad()
66
- def __call__(self, image):
67
-
68
- image = image.resize((512, 512))
69
-
70
- if image.mode == 'RGBA':
71
- image_array = np.array(image)
72
- alpha_channel = image_array[:, :, 3]
73
- erosion_size = 3
74
- kernel = np.ones((erosion_size, erosion_size), np.uint8)
75
- alpha_channel = cv2.erode(alpha_channel, kernel, iterations=1)
76
- image_array[alpha_channel == 0, :3] = 255
77
- image_array[:, :, 3] = alpha_channel
78
- image = Image.fromarray(image_array)
79
-
80
- image_tensor = torch.tensor(np.array(image) / 255.0).to(self.device)
81
- alpha = image_tensor[:, :, 3:]
82
- rgb_target = image_tensor[:, :, :3]
83
- else:
84
- image_tensor = torch.tensor(np.array(image) / 255.0).to(self.device)
85
- alpha = torch.ones_like(image_tensor)[:, :, :1]
86
- rgb_target = image_tensor[:, :, :3]
87
-
88
- image = image.convert('RGB')
89
-
90
- image = self.pipeline(
91
- prompt="",
92
- image=image,
93
- generator=torch.manual_seed(42),
94
- height=512,
95
- width=512,
96
- num_inference_steps=50,
97
- image_guidance_scale=self.cfg_image,
98
- guidance_scale=self.cfg_text,
99
- ).images[0]
100
-
101
- image_tensor = torch.tensor(np.array(image)/255.0).to(self.device)
102
- rgb_src = image_tensor[:,:,:3]
103
- image = self.recorrect_rgb(rgb_src, rgb_target, alpha)
104
- image = image[:,:,:3]*image[:,:,3:] + torch.ones_like(image[:,:,:3])*(1.0-image[:,:,3:])
105
- image = Image.fromarray((image.cpu().numpy()*255).astype(np.uint8))
106
-
107
- return image
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
demo.png DELETED

Git LFS Details

  • SHA256: 4260b9a45c39fc4045bae81d27f8eb17127cdb201df614193077268d996ce436
  • Pointer size: 131 Bytes
  • Size of remote file: 151 kB
example_prompts.txt DELETED
@@ -1,5 +0,0 @@
1
- 一片绿色的树叶在白色背景上居中展现,清晰的纹理
2
- 一只棕白相间的仓鼠,站在白色背景前。照片采用居中构图方式,卡通风格
3
- 一盆绿色植物生长在红色花盆中,居中,写实
4
- a pot of green plants grows in a red flower pot.
5
- a lovely rabbit eating carrots
 
 
 
 
 
 
gitattributes DELETED
@@ -1,46 +0,0 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
36
- assets/images/arch.jpg filter=lfs diff=lfs merge=lfs -text
37
- assets/images/e2e-1.gif filter=lfs diff=lfs merge=lfs -text
38
- assets/images/e2e-2.gif filter=lfs diff=lfs merge=lfs -text
39
- assets/images/system.jpg filter=lfs diff=lfs merge=lfs -text
40
- assets/images/teaser.jpg filter=lfs diff=lfs merge=lfs -text
41
- gradio_cache/0/textured_mesh.glb filter=lfs diff=lfs merge=lfs -text
42
- gradio_cache/3/textured_mesh.glb filter=lfs diff=lfs merge=lfs -text
43
- gradio_cache/4/textured_mesh.glb filter=lfs diff=lfs merge=lfs -text
44
- gradio_cache/5/textured_mesh.glb filter=lfs diff=lfs merge=lfs -text
45
- *.whl filter=lfs diff=lfs merge=lfs -text
46
- gradio_cache/1/textured_mesh.glb filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
gitignore DELETED
@@ -1,168 +0,0 @@
1
- # Byte-compiled / optimized / DLL files
2
- __pycache__/
3
- *.py[cod]
4
- *$py.class
5
- .DS_Store
6
- # C extensions
7
- *.so
8
-
9
- # Distribution / packaging
10
- .Python
11
- build/
12
- develop-eggs/
13
- dist/
14
- downloads/
15
- eggs/
16
- .eggs/
17
- lib/
18
- lib64/
19
- parts/
20
- sdist/
21
- var/
22
- wheels/
23
- share/python-wheels/
24
- *.egg-info/
25
- .installed.cfg
26
- *.egg
27
- MANIFEST
28
-
29
- # PyInstaller
30
- # Usually these files are written by a python script from a template
31
- # before PyInstaller builds the exe, so as to inject date/other infos into it.
32
- *.manifest
33
- *.spec
34
-
35
- # Installer logs
36
- pip-log.txt
37
- pip-delete-this-directory.txt
38
-
39
- # Unit test / coverage reports
40
- htmlcov/
41
- .tox/
42
- .nox/
43
- .coverage
44
- .coverage.*
45
- .cache
46
- nosetests.xml
47
- coverage.xml
48
- *.cover
49
- *.py,cover
50
- .hypothesis/
51
- .pytest_cache/
52
- cover/
53
-
54
- # Translations
55
- *.mo
56
- *.pot
57
-
58
- # Django stuff:
59
- *.log
60
- local_settings.py
61
- db.sqlite3
62
- db.sqlite3-journal
63
-
64
- # Flask stuff:
65
- instance/
66
- .webassets-cache
67
-
68
- # Scrapy stuff:
69
- .scrapy
70
-
71
- # Sphinx documentation
72
- docs/_build/
73
-
74
- # PyBuilder
75
- .pybuilder/
76
- target/
77
-
78
- # Jupyter Notebook
79
- .ipynb_checkpoints
80
-
81
- # IPython
82
- profile_default/
83
- ipython_config.py
84
-
85
- # pyenv
86
- # For a library or package, you might want to ignore these files since the code is
87
- # intended to run in multiple environments; otherwise, check them in:
88
- # .python-version
89
-
90
- # pipenv
91
- # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92
- # However, in case of collaboration, if having platform-specific dependencies or dependencies
93
- # having no cross-platform support, pipenv may install dependencies that don't work, or not
94
- # install all needed dependencies.
95
- #Pipfile.lock
96
-
97
- # UV
98
- # Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
99
- # This is especially recommended for binary packages to ensure reproducibility, and is more
100
- # commonly ignored for libraries.
101
- #uv.lock
102
-
103
- # poetry
104
- # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
105
- # This is especially recommended for binary packages to ensure reproducibility, and is more
106
- # commonly ignored for libraries.
107
- # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
108
- #poetry.lock
109
-
110
- # pdm
111
- # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
112
- #pdm.lock
113
- # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
114
- # in version control.
115
- # https://pdm.fming.dev/latest/usage/project/#working-with-version-control
116
- .pdm.toml
117
- .pdm-python
118
- .pdm-build/
119
-
120
- # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
121
- __pypackages__/
122
-
123
- # Celery stuff
124
- celerybeat-schedule
125
- celerybeat.pid
126
-
127
- # SageMath parsed files
128
- *.sage.py
129
-
130
- # Environments
131
- .env
132
- .venv
133
- env/
134
- venv/
135
- ENV/
136
- env.bak/
137
- venv.bak/
138
-
139
- # Spyder project settings
140
- .spyderproject
141
- .spyproject
142
-
143
- # Rope project settings
144
- .ropeproject
145
-
146
- # mkdocs documentation
147
- /site
148
-
149
- # mypy
150
- .mypy_cache/
151
- .dmypy.json
152
- dmypy.json
153
-
154
- # Pyre type checker
155
- .pyre/
156
-
157
- # pytype static type analyzer
158
- .pytype/
159
-
160
- # Cython debug symbols
161
- cython_debug/
162
- gradio_cache/
163
- # PyCharm
164
- # JetBrains specific template is maintained in a separate JetBrains.gitignore that can
165
- # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
166
- # and can be added to the global gitignore or merged into this file. For a more nuclear
167
- # option (not recommended) you can uncomment the following to ignore the entire idea folder.
168
- #.idea/
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
gradient.jpg DELETED
Binary file (68.2 kB)
 
gradio_app.py DELETED
@@ -1,771 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import os
16
- import random
17
- import shutil
18
- import time
19
- from glob import glob
20
- from pathlib import Path
21
-
22
- import gradio as gr
23
- import torch
24
- import trimesh
25
- import uvicorn
26
- from fastapi import FastAPI
27
- from fastapi.staticfiles import StaticFiles
28
- import uuid
29
-
30
- from hy3dgen.shapegen.utils import logger
31
-
32
- MAX_SEED = 1e7
33
-
34
- if True:
35
- import os
36
- import spaces
37
- import subprocess
38
- import sys
39
- import shlex
40
- print("cd /home/user/app/hy3dgen/texgen/differentiable_renderer/ && bash compile_mesh_painter.sh")
41
- os.system("cd /home/user/app/hy3dgen/texgen/differentiable_renderer/ && bash compile_mesh_painter.sh")
42
- print('install custom')
43
- subprocess.run(shlex.split("pip install custom_rasterizer-0.1-cp310-cp310-linux_x86_64.whl"), check=True)
44
-
45
-
46
- def get_example_img_list():
47
- print('Loading example img list ...')
48
- return sorted(glob('./assets/example_images/**/*.png', recursive=True))
49
-
50
-
51
- def get_example_txt_list():
52
- print('Loading example txt list ...')
53
- txt_list = list()
54
- for line in open('./assets/example_prompts.txt', encoding='utf-8'):
55
- txt_list.append(line.strip())
56
- return txt_list
57
-
58
-
59
- def get_example_mv_list():
60
- print('Loading example mv list ...')
61
- mv_list = list()
62
- root = './assets/example_mv_images'
63
- for mv_dir in os.listdir(root):
64
- view_list = []
65
- for view in ['front', 'back', 'left', 'right']:
66
- path = os.path.join(root, mv_dir, f'{view}.png')
67
- if os.path.exists(path):
68
- view_list.append(path)
69
- else:
70
- view_list.append(None)
71
- mv_list.append(view_list)
72
- return mv_list
73
-
74
-
75
- def gen_save_folder(max_size=200):
76
- os.makedirs(SAVE_DIR, exist_ok=True)
77
-
78
- # 获取所有文件夹路径
79
- dirs = [f for f in Path(SAVE_DIR).iterdir() if f.is_dir()]
80
-
81
- # 如果文件夹数量超过 max_size,删除创建时间最久的文件夹
82
- if len(dirs) >= max_size:
83
- # 按创建时间排序,最久的排在前面
84
- oldest_dir = min(dirs, key=lambda x: x.stat().st_ctime)
85
- shutil.rmtree(oldest_dir)
86
- print(f"Removed the oldest folder: {oldest_dir}")
87
-
88
- # 生成一个新的 uuid 文件夹名称
89
- new_folder = os.path.join(SAVE_DIR, str(uuid.uuid4()))
90
- os.makedirs(new_folder, exist_ok=True)
91
- print(f"Created new folder: {new_folder}")
92
-
93
- return new_folder
94
-
95
-
96
- def export_mesh(mesh, save_folder, textured=False, type='glb'):
97
- if textured:
98
- path = os.path.join(save_folder, f'textured_mesh.{type}')
99
- else:
100
- path = os.path.join(save_folder, f'white_mesh.{type}')
101
- if type not in ['glb', 'obj']:
102
- mesh.export(path)
103
- else:
104
- mesh.export(path, include_normals=textured)
105
- return path
106
-
107
-
108
- def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
109
- if randomize_seed:
110
- seed = random.randint(0, MAX_SEED)
111
- return seed
112
-
113
-
114
- def build_model_viewer_html(save_folder, height=660, width=790, textured=False):
115
- # Remove first folder from path to make relative path
116
- if textured:
117
- related_path = f"./textured_mesh.glb"
118
- template_name = './assets/modelviewer-textured-template.html'
119
- output_html_path = os.path.join(save_folder, f'textured_mesh.html')
120
- else:
121
- related_path = f"./white_mesh.glb"
122
- template_name = './assets/modelviewer-template.html'
123
- output_html_path = os.path.join(save_folder, f'white_mesh.html')
124
- offset = 50 if textured else 10
125
- with open(os.path.join(CURRENT_DIR, template_name), 'r', encoding='utf-8') as f:
126
- template_html = f.read()
127
-
128
- with open(output_html_path, 'w', encoding='utf-8') as f:
129
- template_html = template_html.replace('#height#', f'{height - offset}')
130
- template_html = template_html.replace('#width#', f'{width}')
131
- template_html = template_html.replace('#src#', f'{related_path}/')
132
- f.write(template_html)
133
-
134
- rel_path = os.path.relpath(output_html_path, SAVE_DIR)
135
- iframe_tag = f'<iframe src="/static/{rel_path}" height="{height}" width="100%" frameborder="0"></iframe>'
136
- print(
137
- f'Find html file {output_html_path}, {os.path.exists(output_html_path)}, relative HTML path is /static/{rel_path}')
138
-
139
- return f"""
140
- <div style='height: {height}; width: 100%;'>
141
- {iframe_tag}
142
- </div>
143
- """
144
-
145
- @spaces.GPU(duration=40)
146
- def _gen_shape(
147
- caption=None,
148
- image=None,
149
- mv_image_front=None,
150
- mv_image_back=None,
151
- mv_image_left=None,
152
- mv_image_right=None,
153
- steps=50,
154
- guidance_scale=7.5,
155
- seed=1234,
156
- octree_resolution=256,
157
- check_box_rembg=False,
158
- num_chunks=200000,
159
- randomize_seed: bool = False,
160
- ):
161
- if not MV_MODE and image is None and caption is None:
162
- raise gr.Error("Please provide either a caption or an image.")
163
- if MV_MODE:
164
- if mv_image_front is None and mv_image_back is None and mv_image_left is None and mv_image_right is None:
165
- raise gr.Error("Please provide at least one view image.")
166
- image = {}
167
- if mv_image_front:
168
- image['front'] = mv_image_front
169
- if mv_image_back:
170
- image['back'] = mv_image_back
171
- if mv_image_left:
172
- image['left'] = mv_image_left
173
- if mv_image_right:
174
- image['right'] = mv_image_right
175
-
176
- seed = int(randomize_seed_fn(seed, randomize_seed))
177
-
178
- octree_resolution = int(octree_resolution)
179
- if caption: print('prompt is', caption)
180
- save_folder = gen_save_folder()
181
- stats = {
182
- 'model': {
183
- 'shapegen': f'{args.model_path}/{args.subfolder}',
184
- 'texgen': f'{args.texgen_model_path}',
185
- },
186
- 'params': {
187
- 'caption': caption,
188
- 'steps': steps,
189
- 'guidance_scale': guidance_scale,
190
- 'seed': seed,
191
- 'octree_resolution': octree_resolution,
192
- 'check_box_rembg': check_box_rembg,
193
- 'num_chunks': num_chunks,
194
- }
195
- }
196
- time_meta = {}
197
-
198
- if image is None:
199
- start_time = time.time()
200
- try:
201
- image = t2i_worker(caption)
202
- except Exception as e:
203
- raise gr.Error(f"Text to 3D is disable. Please enable it by `python gradio_app.py --enable_t23d`.")
204
- time_meta['text2image'] = time.time() - start_time
205
-
206
- # remove disk io to make responding faster, uncomment at your will.
207
- # image.save(os.path.join(save_folder, 'input.png'))
208
- if MV_MODE:
209
- start_time = time.time()
210
- for k, v in image.items():
211
- if check_box_rembg or v.mode == "RGB":
212
- img = rmbg_worker(v.convert('RGB'))
213
- image[k] = img
214
- time_meta['remove background'] = time.time() - start_time
215
- else:
216
- if check_box_rembg or image.mode == "RGB":
217
- start_time = time.time()
218
- image = rmbg_worker(image.convert('RGB'))
219
- time_meta['remove background'] = time.time() - start_time
220
-
221
- # remove disk io to make responding faster, uncomment at your will.
222
- # image.save(os.path.join(save_folder, 'rembg.png'))
223
-
224
- # image to white model
225
- start_time = time.time()
226
-
227
- generator = torch.Generator()
228
- generator = generator.manual_seed(int(seed))
229
- outputs = i23d_worker(
230
- image=image,
231
- num_inference_steps=steps,
232
- guidance_scale=guidance_scale,
233
- generator=generator,
234
- octree_resolution=octree_resolution,
235
- num_chunks=num_chunks,
236
- output_type='mesh'
237
- )
238
- time_meta['shape generation'] = time.time() - start_time
239
- logger.info("---Shape generation takes %s seconds ---" % (time.time() - start_time))
240
-
241
- tmp_start = time.time()
242
- mesh = export_to_trimesh(outputs)[0]
243
- time_meta['export to trimesh'] = time.time() - tmp_start
244
-
245
- stats['number_of_faces'] = mesh.faces.shape[0]
246
- stats['number_of_vertices'] = mesh.vertices.shape[0]
247
-
248
- stats['time'] = time_meta
249
- main_image = image if not MV_MODE else image['front']
250
- return mesh, main_image, save_folder, stats, seed
251
-
252
- @spaces.GPU(duration=90)
253
- def generation_all(
254
- caption=None,
255
- image=None,
256
- mv_image_front=None,
257
- mv_image_back=None,
258
- mv_image_left=None,
259
- mv_image_right=None,
260
- steps=50,
261
- guidance_scale=7.5,
262
- seed=1234,
263
- octree_resolution=256,
264
- check_box_rembg=False,
265
- num_chunks=200000,
266
- randomize_seed: bool = False,
267
- ):
268
- start_time_0 = time.time()
269
- mesh, image, save_folder, stats, seed = _gen_shape(
270
- caption,
271
- image,
272
- mv_image_front=mv_image_front,
273
- mv_image_back=mv_image_back,
274
- mv_image_left=mv_image_left,
275
- mv_image_right=mv_image_right,
276
- steps=steps,
277
- guidance_scale=guidance_scale,
278
- seed=seed,
279
- octree_resolution=octree_resolution,
280
- check_box_rembg=check_box_rembg,
281
- num_chunks=num_chunks,
282
- randomize_seed=randomize_seed,
283
- )
284
- path = export_mesh(mesh, save_folder, textured=False)
285
-
286
- # tmp_time = time.time()
287
- # mesh = floater_remove_worker(mesh)
288
- # mesh = degenerate_face_remove_worker(mesh)
289
- # logger.info("---Postprocessing takes %s seconds ---" % (time.time() - tmp_time))
290
- # stats['time']['postprocessing'] = time.time() - tmp_time
291
-
292
- tmp_time = time.time()
293
- mesh = face_reduce_worker(mesh)
294
- logger.info("---Face Reduction takes %s seconds ---" % (time.time() - tmp_time))
295
- stats['time']['face reduction'] = time.time() - tmp_time
296
-
297
- tmp_time = time.time()
298
- textured_mesh = texgen_worker(mesh, image)
299
- logger.info("---Texture Generation takes %s seconds ---" % (time.time() - tmp_time))
300
- stats['time']['texture generation'] = time.time() - tmp_time
301
- stats['time']['total'] = time.time() - start_time_0
302
-
303
- textured_mesh.metadata['extras'] = stats
304
- path_textured = export_mesh(textured_mesh, save_folder, textured=True)
305
- model_viewer_html_textured = build_model_viewer_html(save_folder, height=HTML_HEIGHT, width=HTML_WIDTH,
306
- textured=True)
307
- if args.low_vram_mode:
308
- torch.cuda.empty_cache()
309
- return (
310
- gr.update(value=path),
311
- gr.update(value=path_textured),
312
- model_viewer_html_textured,
313
- stats,
314
- seed,
315
- )
316
-
317
- @spaces.GPU(duration=40)
318
- def shape_generation(
319
- caption=None,
320
- image=None,
321
- mv_image_front=None,
322
- mv_image_back=None,
323
- mv_image_left=None,
324
- mv_image_right=None,
325
- steps=50,
326
- guidance_scale=7.5,
327
- seed=1234,
328
- octree_resolution=256,
329
- check_box_rembg=False,
330
- num_chunks=200000,
331
- randomize_seed: bool = False,
332
- ):
333
- start_time_0 = time.time()
334
- mesh, image, save_folder, stats, seed = _gen_shape(
335
- caption,
336
- image,
337
- mv_image_front=mv_image_front,
338
- mv_image_back=mv_image_back,
339
- mv_image_left=mv_image_left,
340
- mv_image_right=mv_image_right,
341
- steps=steps,
342
- guidance_scale=guidance_scale,
343
- seed=seed,
344
- octree_resolution=octree_resolution,
345
- check_box_rembg=check_box_rembg,
346
- num_chunks=num_chunks,
347
- randomize_seed=randomize_seed,
348
- )
349
- stats['time']['total'] = time.time() - start_time_0
350
- mesh.metadata['extras'] = stats
351
-
352
- path = export_mesh(mesh, save_folder, textured=False)
353
- model_viewer_html = build_model_viewer_html(save_folder, height=HTML_HEIGHT, width=HTML_WIDTH)
354
- if args.low_vram_mode:
355
- torch.cuda.empty_cache()
356
- return (
357
- gr.update(value=path),
358
- model_viewer_html,
359
- stats,
360
- seed,
361
- )
362
-
363
-
364
- def build_app():
365
- title = 'Hunyuan3D-2: High Resolution Textured 3D Assets Generation'
366
- if MV_MODE:
367
- title = 'Hunyuan3D-2mv: Image to 3D Generation with 1-4 Views'
368
- if 'mini' in args.subfolder:
369
- title = 'Hunyuan3D-2mini: Strong 0.6B Image to Shape Generator'
370
- if TURBO_MODE:
371
- title = title.replace(':', '-Turbo: Fast ')
372
-
373
- title_html = f"""
374
- <div style="font-size: 2em; font-weight: bold; text-align: center; margin-bottom: 5px">
375
-
376
- {title}
377
- </div>
378
- <div align="center">
379
- Tencent Hunyuan3D Team
380
- </div>
381
- <div align="center">
382
- <a href="https://github.com/tencent/Hunyuan3D-2">Github</a> &ensp;
383
- <a href="http://3d-models.hunyuan.tencent.com">Homepage</a> &ensp;
384
- <a href="https://3d.hunyuan.tencent.com">Hunyuan3D Studio</a> &ensp;
385
- <a href="#">Technical Report</a> &ensp;
386
- <a href="https://huggingface.co/Tencent/Hunyuan3D-2"> Pretrained Models</a> &ensp;
387
- </div>
388
- """
389
- custom_css = """
390
- .app.svelte-wpkpf6.svelte-wpkpf6:not(.fill_width) {
391
- max-width: 1480px;
392
- }
393
- .mv-image button .wrap {
394
- font-size: 10px;
395
- }
396
-
397
- .mv-image .icon-wrap {
398
- width: 20px;
399
- }
400
-
401
- """
402
-
403
- with gr.Blocks(theme=gr.themes.Base(), title='Hunyuan-3D-2.0', analytics_enabled=False, css=custom_css) as demo:
404
- gr.HTML(title_html)
405
-
406
- with gr.Row():
407
- with gr.Column(scale=3):
408
- with gr.Tabs(selected='tab_img_prompt') as tabs_prompt:
409
- with gr.Tab('Image Prompt', id='tab_img_prompt', visible=not MV_MODE) as tab_ip:
410
- image = gr.Image(label='Image', type='pil', image_mode='RGBA', height=290)
411
-
412
- with gr.Tab('Text Prompt', id='tab_txt_prompt', visible=HAS_T2I and not MV_MODE) as tab_tp:
413
- caption = gr.Textbox(label='Text Prompt',
414
- placeholder='HunyuanDiT will be used to generate image.',
415
- info='Example: A 3D model of a cute cat, white background')
416
- with gr.Tab('MultiView Prompt', visible=MV_MODE) as tab_mv:
417
- # gr.Label('Please upload at least one front image.')
418
- with gr.Row():
419
- mv_image_front = gr.Image(label='Front', type='pil', image_mode='RGBA', height=140,
420
- min_width=100, elem_classes='mv-image')
421
- mv_image_back = gr.Image(label='Back', type='pil', image_mode='RGBA', height=140,
422
- min_width=100, elem_classes='mv-image')
423
- with gr.Row():
424
- mv_image_left = gr.Image(label='Left', type='pil', image_mode='RGBA', height=140,
425
- min_width=100, elem_classes='mv-image')
426
- mv_image_right = gr.Image(label='Right', type='pil', image_mode='RGBA', height=140,
427
- min_width=100, elem_classes='mv-image')
428
-
429
- with gr.Row():
430
- btn = gr.Button(value='Gen Shape', variant='primary', min_width=100)
431
- btn_all = gr.Button(value='Gen Textured Shape',
432
- variant='primary',
433
- visible=HAS_TEXTUREGEN,
434
- min_width=100)
435
-
436
- with gr.Group():
437
- file_out = gr.File(label="File", visible=False)
438
- file_out2 = gr.File(label="File", visible=False)
439
-
440
- with gr.Tabs(selected='tab_options' if TURBO_MODE else 'tab_export'):
441
- with gr.Tab("Options", id='tab_options', visible=TURBO_MODE):
442
- gen_mode = gr.Radio(label='Generation Mode',
443
- info='Recommendation: Turbo for most cases, Fast for very complex cases, Standard seldom use.',
444
- choices=['Turbo', 'Fast', 'Standard'], value='Turbo')
445
- decode_mode = gr.Radio(label='Decoding Mode',
446
- info='The resolution for exporting mesh from generated vectset',
447
- choices=['Low', 'Standard', 'High'],
448
- value='Standard')
449
- with gr.Tab('Advanced Options', id='tab_advanced_options'):
450
- with gr.Row():
451
- check_box_rembg = gr.Checkbox(value=True, label='Remove Background', min_width=100)
452
- randomize_seed = gr.Checkbox(label="Randomize seed", value=True, min_width=100)
453
- seed = gr.Slider(
454
- label="Seed",
455
- minimum=0,
456
- maximum=MAX_SEED,
457
- step=1,
458
- value=1234,
459
- min_width=100,
460
- )
461
- with gr.Row():
462
- num_steps = gr.Slider(maximum=100,
463
- minimum=1,
464
- value=5 if 'turbo' in args.subfolder else 30,
465
- step=1, label='Inference Steps')
466
- octree_resolution = gr.Slider(maximum=512, minimum=16, value=256, label='Octree Resolution')
467
- with gr.Row():
468
- cfg_scale = gr.Number(value=5.0, label='Guidance Scale', min_width=100)
469
- num_chunks = gr.Slider(maximum=5000000, minimum=1000, value=8000,
470
- label='Number of Chunks', min_width=100)
471
- with gr.Tab("Export", id='tab_export'):
472
- with gr.Row():
473
- file_type = gr.Dropdown(label='File Type', choices=SUPPORTED_FORMATS,
474
- value='glb', min_width=100)
475
- reduce_face = gr.Checkbox(label='Simplify Mesh', value=False, min_width=100)
476
- export_texture = gr.Checkbox(label='Include Texture', value=False,
477
- visible=False, min_width=100)
478
- target_face_num = gr.Slider(maximum=1000000, minimum=100, value=10000,
479
- label='Target Face Number')
480
- with gr.Row():
481
- confirm_export = gr.Button(value="Transform", min_width=100)
482
- file_export = gr.DownloadButton(label="Download", variant='primary',
483
- interactive=False, min_width=100)
484
-
485
- with gr.Column(scale=6):
486
- with gr.Tabs(selected='gen_mesh_panel') as tabs_output:
487
- with gr.Tab('Generated Mesh', id='gen_mesh_panel'):
488
- html_gen_mesh = gr.HTML(HTML_OUTPUT_PLACEHOLDER, label='Output')
489
- with gr.Tab('Exporting Mesh', id='export_mesh_panel'):
490
- html_export_mesh = gr.HTML(HTML_OUTPUT_PLACEHOLDER, label='Output')
491
- with gr.Tab('Mesh Statistic', id='stats_panel'):
492
- stats = gr.Json({}, label='Mesh Stats')
493
-
494
- with gr.Column(scale=3 if MV_MODE else 2):
495
- with gr.Tabs(selected='tab_img_gallery') as gallery:
496
- with gr.Tab('Image to 3D Gallery', id='tab_img_gallery', visible=not MV_MODE) as tab_gi:
497
- with gr.Row():
498
- gr.Examples(examples=example_is, inputs=[image],
499
- label=None, examples_per_page=18)
500
-
501
- with gr.Tab('Text to 3D Gallery', id='tab_txt_gallery', visible=HAS_T2I and not MV_MODE) as tab_gt:
502
- with gr.Row():
503
- gr.Examples(examples=example_ts, inputs=[caption],
504
- label=None, examples_per_page=18)
505
- with gr.Tab('MultiView to 3D Gallery', id='tab_mv_gallery', visible=MV_MODE) as tab_mv:
506
- with gr.Row():
507
- gr.Examples(examples=example_mvs,
508
- inputs=[mv_image_front, mv_image_back, mv_image_left, mv_image_right],
509
- label=None, examples_per_page=6)
510
-
511
- gr.HTML(f"""
512
- <div align="center">
513
- Activated Model - Shape Generation ({args.model_path}/{args.subfolder}) ; Texture Generation ({'Hunyuan3D-2' if HAS_TEXTUREGEN else 'Unavailable'})
514
- </div>
515
- """)
516
- if not HAS_TEXTUREGEN:
517
- gr.HTML("""
518
- <div style="margin-top: 5px;" align="center">
519
- <b>Warning: </b>
520
- Texture synthesis is disable due to missing requirements,
521
- please install requirements following <a href="https://github.com/Tencent/Hunyuan3D-2?tab=readme-ov-file#install-requirements">README.md</a>to activate it.
522
- </div>
523
- """)
524
- if not args.enable_t23d:
525
- gr.HTML("""
526
- <div style="margin-top: 5px;" align="center">
527
- <b>Warning: </b>
528
- Text to 3D is disable. To activate it, please run `python gradio_app.py --enable_t23d`.
529
- </div>
530
- """)
531
-
532
- tab_ip.select(fn=lambda: gr.update(selected='tab_img_gallery'), outputs=gallery)
533
- if HAS_T2I:
534
- tab_tp.select(fn=lambda: gr.update(selected='tab_txt_gallery'), outputs=gallery)
535
-
536
- btn.click(
537
- shape_generation,
538
- inputs=[
539
- caption,
540
- image,
541
- mv_image_front,
542
- mv_image_back,
543
- mv_image_left,
544
- mv_image_right,
545
- num_steps,
546
- cfg_scale,
547
- seed,
548
- octree_resolution,
549
- check_box_rembg,
550
- num_chunks,
551
- randomize_seed,
552
- ],
553
- outputs=[file_out, html_gen_mesh, stats, seed]
554
- ).then(
555
- lambda: (gr.update(visible=False, value=False), gr.update(interactive=True), gr.update(interactive=True),
556
- gr.update(interactive=False)),
557
- outputs=[export_texture, reduce_face, confirm_export, file_export],
558
- ).then(
559
- lambda: gr.update(selected='gen_mesh_panel'),
560
- outputs=[tabs_output],
561
- )
562
-
563
- btn_all.click(
564
- generation_all,
565
- inputs=[
566
- caption,
567
- image,
568
- mv_image_front,
569
- mv_image_back,
570
- mv_image_left,
571
- mv_image_right,
572
- num_steps,
573
- cfg_scale,
574
- seed,
575
- octree_resolution,
576
- check_box_rembg,
577
- num_chunks,
578
- randomize_seed,
579
- ],
580
- outputs=[file_out, file_out2, html_gen_mesh, stats, seed]
581
- ).then(
582
- lambda: (gr.update(visible=True, value=True), gr.update(interactive=False), gr.update(interactive=True),
583
- gr.update(interactive=False)),
584
- outputs=[export_texture, reduce_face, confirm_export, file_export],
585
- ).then(
586
- lambda: gr.update(selected='gen_mesh_panel'),
587
- outputs=[tabs_output],
588
- )
589
-
590
- def on_gen_mode_change(value):
591
- if value == 'Turbo':
592
- return gr.update(value=5)
593
- elif value == 'Fast':
594
- return gr.update(value=10)
595
- else:
596
- return gr.update(value=30)
597
-
598
- gen_mode.change(on_gen_mode_change, inputs=[gen_mode], outputs=[num_steps])
599
-
600
- def on_decode_mode_change(value):
601
- if value == 'Low':
602
- return gr.update(value=196)
603
- elif value == 'Standard':
604
- return gr.update(value=256)
605
- else:
606
- return gr.update(value=384)
607
-
608
- decode_mode.change(on_decode_mode_change, inputs=[decode_mode], outputs=[octree_resolution])
609
-
610
- def on_export_click(file_out, file_out2, file_type, reduce_face, export_texture, target_face_num):
611
- if file_out is None:
612
- raise gr.Error('Please generate a mesh first.')
613
-
614
- print(f'exporting {file_out}')
615
- print(f'reduce face to {target_face_num}')
616
- if export_texture:
617
- mesh = trimesh.load(file_out2)
618
- save_folder = gen_save_folder()
619
- path = export_mesh(mesh, save_folder, textured=True, type=file_type)
620
-
621
- # for preview
622
- save_folder = gen_save_folder()
623
- _ = export_mesh(mesh, save_folder, textured=True)
624
- model_viewer_html = build_model_viewer_html(save_folder, height=HTML_HEIGHT, width=HTML_WIDTH,
625
- textured=True)
626
- else:
627
- mesh = trimesh.load(file_out)
628
- mesh = floater_remove_worker(mesh)
629
- mesh = degenerate_face_remove_worker(mesh)
630
- if reduce_face:
631
- mesh = face_reduce_worker(mesh, target_face_num)
632
- save_folder = gen_save_folder()
633
- path = export_mesh(mesh, save_folder, textured=False, type=file_type)
634
-
635
- # for preview
636
- save_folder = gen_save_folder()
637
- _ = export_mesh(mesh, save_folder, textured=False)
638
- model_viewer_html = build_model_viewer_html(save_folder, height=HTML_HEIGHT, width=HTML_WIDTH,
639
- textured=False)
640
- print(f'export to {path}')
641
- return model_viewer_html, gr.update(value=path, interactive=True)
642
-
643
- confirm_export.click(
644
- lambda: gr.update(selected='export_mesh_panel'),
645
- outputs=[tabs_output],
646
- ).then(
647
- on_export_click,
648
- inputs=[file_out, file_out2, file_type, reduce_face, export_texture, target_face_num],
649
- outputs=[html_export_mesh, file_export]
650
- )
651
-
652
- return demo
653
-
654
-
655
- if __name__ == '__main__':
656
- import argparse
657
-
658
- parser = argparse.ArgumentParser()
659
- parser.add_argument("--model_path", type=str, default='tencent/Hunyuan3D-2')
660
- parser.add_argument("--subfolder", type=str, default='hunyuan3d-dit-v2-0')
661
- parser.add_argument("--texgen_model_path", type=str, default='tencent/Hunyuan3D-2')
662
- parser.add_argument('--port', type=int, default=7860)
663
- parser.add_argument('--host', type=str, default='0.0.0.0')
664
- parser.add_argument('--device', type=str, default='cuda')
665
- parser.add_argument('--mc_algo', type=str, default='mc')
666
- parser.add_argument('--cache-path', type=str, default='gradio_cache')
667
- parser.add_argument('--enable_t23d', action='store_true')
668
- parser.add_argument('--disable_tex', action='store_true')
669
- parser.add_argument('--enable_flashvdm', action='store_true')
670
- parser.add_argument('--compile', action='store_true')
671
- parser.add_argument('--low_vram_mode', action='store_true')
672
- args = parser.parse_args()
673
-
674
- args.enable_flashvdm = True
675
- args.enable_t23d = False
676
-
677
- SAVE_DIR = args.cache_path
678
- os.makedirs(SAVE_DIR, exist_ok=True)
679
-
680
- CURRENT_DIR = os.path.dirname(os.path.abspath(__file__))
681
- MV_MODE = 'mv' in args.model_path
682
- TURBO_MODE = 'turbo' in args.subfolder
683
-
684
- HTML_HEIGHT = 690 if MV_MODE else 650
685
- HTML_WIDTH = 500
686
- HTML_OUTPUT_PLACEHOLDER = f"""
687
- <div style='height: {650}px; width: 100%; border-radius: 8px; border-color: #e5e7eb; border-style: solid; border-width: 1px; display: flex; justify-content: center; align-items: center;'>
688
- <div style='text-align: center; font-size: 16px; color: #6b7280;'>
689
- <p style="color: #8d8d8d;">Welcome to Hunyuan3D!</p>
690
- <p style="color: #8d8d8d;">No mesh here.</p>
691
- </div>
692
- </div>
693
- """
694
-
695
- INPUT_MESH_HTML = """
696
- <div style='height: 490px; width: 100%; border-radius: 8px;
697
- border-color: #e5e7eb; order-style: solid; border-width: 1px;'>
698
- </div>
699
- """
700
- example_is = get_example_img_list()
701
- example_ts = get_example_txt_list()
702
- example_mvs = get_example_mv_list()
703
-
704
- SUPPORTED_FORMATS = ['glb', 'obj', 'ply', 'stl']
705
-
706
- HAS_TEXTUREGEN = False
707
- if not args.disable_tex:
708
- try:
709
- from hy3dgen.texgen import Hunyuan3DPaintPipeline
710
-
711
- texgen_worker = Hunyuan3DPaintPipeline.from_pretrained(args.texgen_model_path)
712
- if args.low_vram_mode:
713
- texgen_worker.enable_model_cpu_offload()
714
- # Not help much, ignore for now.
715
- # if args.compile:
716
- # texgen_worker.models['delight_model'].pipeline.unet.compile()
717
- # texgen_worker.models['delight_model'].pipeline.vae.compile()
718
- # texgen_worker.models['multiview_model'].pipeline.unet.compile()
719
- # texgen_worker.models['multiview_model'].pipeline.vae.compile()
720
- HAS_TEXTUREGEN = True
721
- except Exception as e:
722
- print(e)
723
- print("Failed to load texture generator.")
724
- print('Please try to install requirements by following README.md')
725
- HAS_TEXTUREGEN = False
726
-
727
- HAS_T2I = True
728
- if args.enable_t23d:
729
- from hy3dgen.text2image import HunyuanDiTPipeline
730
-
731
- t2i_worker = HunyuanDiTPipeline('Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers-Distilled')
732
- HAS_T2I = True
733
-
734
- from hy3dgen.shapegen import FaceReducer, FloaterRemover, DegenerateFaceRemover, MeshSimplifier, \
735
- Hunyuan3DDiTFlowMatchingPipeline
736
- from hy3dgen.shapegen.pipelines import export_to_trimesh
737
- from hy3dgen.rembg import BackgroundRemover
738
-
739
- rmbg_worker = BackgroundRemover()
740
- i23d_worker = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained(
741
- args.model_path,
742
- subfolder=args.subfolder,
743
- use_safetensors=True,
744
- device=args.device,
745
- )
746
- if args.enable_flashvdm:
747
- mc_algo = 'mc' if args.device in ['cpu', 'mps'] else args.mc_algo
748
- i23d_worker.enable_flashvdm(mc_algo=mc_algo)
749
- if args.compile:
750
- i23d_worker.compile()
751
-
752
- floater_remove_worker = FloaterRemover()
753
- degenerate_face_remove_worker = DegenerateFaceRemover()
754
- face_reduce_worker = FaceReducer()
755
-
756
- # https://discuss.huggingface.co/t/how-to-serve-an-html-file/33921/2
757
- # create a FastAPI app
758
- app = FastAPI()
759
- # create a static directory to store the static files
760
- static_dir = Path(SAVE_DIR).absolute()
761
- static_dir.mkdir(parents=True, exist_ok=True)
762
- app.mount("/static", StaticFiles(directory=static_dir, html=True), name="static")
763
- shutil.copytree('./assets/env_maps', os.path.join(static_dir, 'env_maps'), dirs_exist_ok=True)
764
-
765
- if args.low_vram_mode:
766
- torch.cuda.empty_cache()
767
- demo = build_app()
768
- app = gr.mount_gradio_app(app, demo, path="/")
769
- from spaces import zero
770
- zero.startup()
771
- uvicorn.run(app, host=args.host, port=args.port)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
hg_app.py CHANGED
@@ -1,4 +1,3 @@
1
- # pip install gradio==4.44.1
2
  import argparse
3
  parser = argparse.ArgumentParser()
4
  parser.add_argument('--port', type=int, default=8080)
@@ -251,7 +250,6 @@ def shape_generation(
251
  def build_app():
252
  title_html = """
253
  <div style="font-size: 2em; font-weight: bold; text-align: center; margin-bottom: 5px">
254
-
255
  Hunyuan3D-2: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
256
  </div>
257
  <div align="center">
@@ -436,4 +434,4 @@ if __name__ == '__main__':
436
  demo = build_app()
437
  demo.queue(max_size=10)
438
  app = gr.mount_gradio_app(app, demo, path="/")
439
- uvicorn.run(app, host=IP, port=PORT)
 
 
1
  import argparse
2
  parser = argparse.ArgumentParser()
3
  parser.add_argument('--port', type=int, default=8080)
 
250
  def build_app():
251
  title_html = """
252
  <div style="font-size: 2em; font-weight: bold; text-align: center; margin-bottom: 5px">
 
253
  Hunyuan3D-2: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
254
  </div>
255
  <div align="center">
 
434
  demo = build_app()
435
  demo.queue(max_size=10)
436
  app = gr.mount_gradio_app(app, demo, path="/")
437
+ uvicorn.run(app, host=IP, port=PORT)
hg_app_bak.py DELETED
@@ -1,402 +0,0 @@
1
- # pip install gradio==4.44.1
2
- if True:
3
- import os
4
- import spaces
5
- import subprocess
6
- import sys
7
- import shlex
8
- print("cd /home/user/app/hy3dgen/texgen/differentiable_renderer/ && bash compile_mesh_painter.sh")
9
- os.system("cd /home/user/app/hy3dgen/texgen/differentiable_renderer/ && bash compile_mesh_painter.sh")
10
- print('install custom')
11
- subprocess.run(shlex.split("pip install custom_rasterizer-0.1-cp310-cp310-linux_x86_64.whl"), check=True)
12
- IP = "0.0.0.0"
13
- PORT = 7860
14
- else:
15
- IP = "0.0.0.0"
16
- PORT = 8080
17
- class spaces:
18
- class GPU:
19
- def __init__(self, duration=60):
20
- self.duration = duration
21
- def __call__(self, func):
22
- return func
23
-
24
- import os
25
- import shutil
26
- import time
27
- from glob import glob
28
- import gradio as gr
29
- import torch
30
- from gradio_litmodel3d import LitModel3D
31
-
32
-
33
- def get_example_img_list():
34
- print('Loading example img list ...')
35
- return sorted(glob('./assets/example_images/*.png'))
36
-
37
-
38
- def get_example_txt_list():
39
- print('Loading example txt list ...')
40
- txt_list = list()
41
- for line in open('./assets/example_prompts.txt'):
42
- txt_list.append(line.strip())
43
- return txt_list
44
-
45
-
46
- def gen_save_folder(max_size=6000):
47
- os.makedirs(SAVE_DIR, exist_ok=True)
48
- exists = set(int(_) for _ in os.listdir(SAVE_DIR) if not _.startswith("."))
49
- cur_id = min(set(range(max_size)) - exists) if len(exists) < max_size else -1
50
- if os.path.exists(f"{SAVE_DIR}/{(cur_id + 1) % max_size}"):
51
- shutil.rmtree(f"{SAVE_DIR}/{(cur_id + 1) % max_size}")
52
- print(f"remove {SAVE_DIR}/{(cur_id + 1) % max_size} success !!!")
53
- save_folder = f"{SAVE_DIR}/{max(0, cur_id)}"
54
- os.makedirs(save_folder, exist_ok=True)
55
- print(f"mkdir {save_folder} suceess !!!")
56
- return save_folder
57
-
58
-
59
- def export_mesh(mesh, save_folder, textured=False):
60
- if textured:
61
- path = os.path.join(save_folder, f'textured_mesh.glb')
62
- else:
63
- path = os.path.join(save_folder, f'white_mesh.glb')
64
- mesh.export(path, include_normals=textured)
65
- return path
66
-
67
-
68
- def build_model_viewer_html(save_folder, height=660, width=790, textured=False):
69
- if textured:
70
- related_path = f"./textured_mesh.glb"
71
- template_name = './assets/modelviewer-textured-template.html'
72
- output_html_path = os.path.join(save_folder, f'textured_mesh.html')
73
- else:
74
- related_path = f"./white_mesh.glb"
75
- template_name = './assets/modelviewer-template.html'
76
- output_html_path = os.path.join(save_folder, f'white_mesh.html')
77
-
78
- with open(os.path.join(CURRENT_DIR, template_name), 'r') as f:
79
- template_html = f.read()
80
- obj_html = f"""
81
- <div class="column is-mobile is-centered">
82
- <model-viewer style="height: {height - 10}px; width: {width}px;" rotation-per-second="10deg" id="modelViewer"
83
- src="{related_path}/" disable-tap
84
- environment-image="neutral" auto-rotate camera-target="0m 0m 0m" orientation="0deg 0deg 170deg" shadow-intensity=".9"
85
- ar auto-rotate camera-controls>
86
- </model-viewer>
87
- </div>
88
- """
89
-
90
- with open(output_html_path, 'w') as f:
91
- f.write(template_html.replace('<model-viewer>', obj_html))
92
-
93
- iframe_tag = f'<iframe src="file/{output_html_path}" height="{height}" width="100%" frameborder="0"></iframe>'
94
- print(f'Find html {output_html_path}, {os.path.exists(output_html_path)}')
95
-
96
- return f"""
97
- <div style='height: {height}; width: 100%;'>
98
- {iframe_tag}
99
- </div>
100
- """
101
-
102
- @spaces.GPU(duration=60)
103
- def _gen_shape(
104
- caption,
105
- image,
106
- steps=50,
107
- guidance_scale=7.5,
108
- seed=1234,
109
- octree_resolution=256,
110
- check_box_rembg=False,
111
- ):
112
- if caption: print('prompt is', caption)
113
- save_folder = gen_save_folder()
114
- stats = {}
115
- time_meta = {}
116
- start_time_0 = time.time()
117
-
118
- image_path = ''
119
- if image is None:
120
- start_time = time.time()
121
- image = t2i_worker(caption)
122
- time_meta['text2image'] = time.time() - start_time
123
-
124
- image.save(os.path.join(save_folder, 'input.png'))
125
-
126
- print(image.mode)
127
- if check_box_rembg or image.mode == "RGB":
128
- start_time = time.time()
129
- image = rmbg_worker(image.convert('RGB'))
130
- time_meta['rembg'] = time.time() - start_time
131
-
132
- image.save(os.path.join(save_folder, 'rembg.png'))
133
-
134
- # image to white model
135
- start_time = time.time()
136
-
137
- generator = torch.Generator()
138
- generator = generator.manual_seed(int(seed))
139
- mesh = i23d_worker(
140
- image=image,
141
- num_inference_steps=steps,
142
- guidance_scale=guidance_scale,
143
- generator=generator,
144
- octree_resolution=octree_resolution
145
- )[0]
146
-
147
- mesh = FloaterRemover()(mesh)
148
- mesh = DegenerateFaceRemover()(mesh)
149
- mesh = FaceReducer()(mesh)
150
-
151
- stats['number_of_faces'] = mesh.faces.shape[0]
152
- stats['number_of_vertices'] = mesh.vertices.shape[0]
153
-
154
- time_meta['image_to_textured_3d'] = {'total': time.time() - start_time}
155
- time_meta['total'] = time.time() - start_time_0
156
- stats['time'] = time_meta
157
- return mesh, save_folder, image
158
-
159
- @spaces.GPU(duration=80)
160
- def generation_all(
161
- caption,
162
- image,
163
- steps=50,
164
- guidance_scale=7.5,
165
- seed=1234,
166
- octree_resolution=256,
167
- check_box_rembg=False
168
- ):
169
- mesh, save_folder, image = _gen_shape(
170
- caption,
171
- image,
172
- steps=steps,
173
- guidance_scale=guidance_scale,
174
- seed=seed,
175
- octree_resolution=octree_resolution,
176
- check_box_rembg=check_box_rembg
177
- )
178
- path = export_mesh(mesh, save_folder, textured=False)
179
- model_viewer_html = build_model_viewer_html(save_folder, height=596, width=700)
180
-
181
- textured_mesh = texgen_worker(mesh, image)
182
- path_textured = export_mesh(textured_mesh, save_folder, textured=True)
183
- model_viewer_html_textured = build_model_viewer_html(save_folder, height=596, width=700, textured=True)
184
-
185
- return (
186
- gr.update(value=path, visible=True),
187
- gr.update(value=path_textured, visible=True),
188
- gr.update(value=path, visible=True),
189
- gr.update(value=path_textured, visible=True),
190
- # model_viewer_html,
191
- # model_viewer_html_textured,
192
- )
193
-
194
- @spaces.GPU(duration=30)
195
- def shape_generation(
196
- caption,
197
- image,
198
- steps=50,
199
- guidance_scale=7.5,
200
- seed=1234,
201
- octree_resolution=256,
202
- check_box_rembg=False,
203
- ):
204
- mesh, save_folder, image = _gen_shape(
205
- caption,
206
- image,
207
- steps=steps,
208
- guidance_scale=guidance_scale,
209
- seed=seed,
210
- octree_resolution=octree_resolution,
211
- check_box_rembg=check_box_rembg
212
- )
213
-
214
- path = export_mesh(mesh, save_folder, textured=False)
215
- model_viewer_html = build_model_viewer_html(save_folder, height=596, width=700)
216
-
217
- return (
218
- gr.update(value=path, visible=True),
219
- gr.update(value=path, visible=True),
220
- # model_viewer_html,
221
- )
222
-
223
-
224
- def build_app():
225
- title_html = """
226
- <div style="font-size: 2em; font-weight: bold; text-align: center; margin-bottom: 20px">
227
-
228
- Hunyuan3D-2: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
229
- </div>
230
- <div align="center">
231
- Tencent Hunyuan3D Team
232
- </div>
233
- <div align="center">
234
- <a href="https://github.com/tencent/Hunyuan3D-1">Github Page</a> &ensp;
235
- <a href="http://3d-models.hunyuan.tencent.com">Homepage</a> &ensp;
236
- <a href="https://arxiv.org/pdf/2411.02293">Technical Report</a> &ensp;
237
- <a href="https://huggingface.co/Tencent/Hunyuan3D-2"> Models</a> &ensp;
238
- </div>
239
- """
240
- css = """
241
- .json-output {
242
- height: 578px;
243
- }
244
- .json-output .json-holder {
245
- height: 538px;
246
- overflow-y: scroll;
247
- }
248
- """
249
-
250
- with gr.Blocks(theme=gr.themes.Base(), css=css, title='Hunyuan-3D-2.0') as demo:
251
- # if not gr.__version__.startswith('4'): gr.HTML(title_html)
252
- gr.HTML(title_html)
253
-
254
- with gr.Row():
255
- with gr.Column(scale=2):
256
- with gr.Tabs() as tabs_prompt:
257
- with gr.Tab('Image Prompt', id='tab_img_prompt') as tab_ip:
258
- image = gr.Image(label='Image', type='pil', image_mode='RGBA', height=290)
259
- with gr.Row():
260
- check_box_rembg = gr.Checkbox(value=True, label='Remove Background')
261
-
262
- with gr.Tab('Text Prompt', id='tab_txt_prompt') as tab_tp:
263
- caption = gr.Textbox(label='Text Prompt',
264
- placeholder='HunyuanDiT will be used to generate image.',
265
- info='Example: A 3D model of a cute cat, white background')
266
-
267
- with gr.Accordion('Advanced Options', open=False):
268
- num_steps = gr.Slider(maximum=50, minimum=20, value=30, step=1, label='Inference Steps')
269
- octree_resolution = gr.Dropdown([256, 384, 512], value=256, label='Octree Resolution')
270
- cfg_scale = gr.Number(value=5.5, label='Guidance Scale')
271
- seed = gr.Slider(maximum=1e7, minimum=0, value=1234, label='Seed')
272
-
273
- with gr.Group():
274
- btn = gr.Button(value='Generate Shape Only', variant='primary')
275
- btn_all = gr.Button(value='Generate Shape and Texture', variant='primary')
276
-
277
- with gr.Group():
278
- file_out = gr.File(label="File", visible=False)
279
- file_out2 = gr.File(label="File", visible=False)
280
-
281
- with gr.Column(scale=5):
282
- with gr.Tabs():
283
- with gr.Tab('Generated Mesh') as mesh1:
284
- mesh_output1 = LitModel3D(
285
- label="3D Model1",
286
- exposure=10.0,
287
- height=600,
288
- visible=True,
289
- clear_color=[0.0, 0.0, 0.0, 0.0],
290
- tonemapping="aces",
291
- contrast=1.0,
292
- scale=1.0,
293
- )
294
- # html_output1 = gr.HTML(HTML_OUTPUT_PLACEHOLDER, label='Output')
295
- with gr.Tab('Generated Textured Mesh') as mesh2:
296
- # html_output2 = gr.HTML(HTML_OUTPUT_PLACEHOLDER, label='Output')
297
- mesh_output2 = LitModel3D(
298
- label="3D Model2",
299
- exposure=10.0,
300
- height=600,
301
- visible=True,
302
- clear_color=[0.0, 0.0, 0.0, 0.0],
303
- tonemapping="aces",
304
- contrast=1.0,
305
- scale=1.0,
306
- )
307
-
308
- with gr.Column(scale=2):
309
- with gr.Tabs() as gallery:
310
- with gr.Tab('Image to 3D Gallery', id='tab_img_gallery') as tab_gi:
311
- with gr.Row():
312
- gr.Examples(examples=example_is, inputs=[image],
313
- label="Image Prompts", examples_per_page=18)
314
-
315
- with gr.Tab('Text to 3D Gallery', id='tab_txt_gallery') as tab_gt:
316
- with gr.Row():
317
- gr.Examples(examples=example_ts, inputs=[caption],
318
- label="Text Prompts", examples_per_page=18)
319
-
320
- tab_gi.select(fn=lambda: gr.update(selected='tab_img_prompt'), outputs=tabs_prompt)
321
- tab_gt.select(fn=lambda: gr.update(selected='tab_txt_prompt'), outputs=tabs_prompt)
322
-
323
- btn.click(
324
- shape_generation,
325
- inputs=[
326
- caption,
327
- image,
328
- num_steps,
329
- cfg_scale,
330
- seed,
331
- octree_resolution,
332
- check_box_rembg,
333
- ],
334
- # outputs=[file_out, html_output1]
335
- outputs=[file_out, mesh_output1]
336
- ).then(
337
- lambda: gr.update(visible=True),
338
- outputs=[file_out],
339
- )
340
-
341
- btn_all.click(
342
- generation_all,
343
- inputs=[
344
- caption,
345
- image,
346
- num_steps,
347
- cfg_scale,
348
- seed,
349
- octree_resolution,
350
- check_box_rembg,
351
- ],
352
- # outputs=[file_out, file_out2, html_output1, html_output2]
353
- outputs=[file_out, file_out2, mesh_output1, mesh_output2]
354
- ).then(
355
- lambda: (gr.update(visible=True), gr.update(visible=True)),
356
- outputs=[file_out, file_out2],
357
- )
358
-
359
- return demo
360
-
361
-
362
- if __name__ == '__main__':
363
- import argparse
364
-
365
- parser = argparse.ArgumentParser()
366
- parser.add_argument('--port', type=int, default=8080)
367
- parser.add_argument('--cache-path', type=str, default='./gradio_cache')
368
- args = parser.parse_args()
369
-
370
- SAVE_DIR = args.cache_path
371
- os.makedirs(SAVE_DIR, exist_ok=True)
372
-
373
- CURRENT_DIR = os.path.dirname(os.path.abspath(__file__))
374
-
375
- HTML_OUTPUT_PLACEHOLDER = """
376
- <div style='height: 596px; width: 100%; border-radius: 8px; border-color: #e5e7eb; order-style: solid; border-width: 1px;'></div>
377
- """
378
-
379
- INPUT_MESH_HTML = """
380
- <div style='height: 490px; width: 100%; border-radius: 8px;
381
- border-color: #e5e7eb; order-style: solid; border-width: 1px;'>
382
- </div>
383
- """
384
- example_is = get_example_img_list()
385
- example_ts = get_example_txt_list()
386
-
387
- from hy3dgen.text2image import HunyuanDiTPipeline
388
- from hy3dgen.shapegen import FaceReducer, FloaterRemover, DegenerateFaceRemover, \
389
- Hunyuan3DDiTFlowMatchingPipeline
390
- from hy3dgen.texgen import Hunyuan3DPaintPipeline
391
- from hy3dgen.rembg import BackgroundRemover
392
-
393
- rmbg_worker = BackgroundRemover()
394
- t2i_worker = HunyuanDiTPipeline()
395
- i23d_worker = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained('tencent/Hunyuan3D-2')
396
- texgen_worker = Hunyuan3DPaintPipeline.from_pretrained('tencent/Hunyuan3D-2')
397
- floater_remove_worker = FloaterRemover()
398
- degenerate_face_remove_worker = DegenerateFaceRemover()
399
- face_reduce_worker = FaceReducer()
400
-
401
- demo = build_app()
402
- demo.queue().launch(server_name=IP,server_port=PORT)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
hunyuan3ddit.py DELETED
@@ -1,410 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import math
16
- import os
17
- from dataclasses import dataclass
18
- from typing import List, Tuple, Optional
19
-
20
- import torch
21
- from einops import rearrange
22
- from torch import Tensor, nn
23
-
24
- scaled_dot_product_attention = nn.functional.scaled_dot_product_attention
25
- if os.environ.get('USE_SAGEATTN', '0') == '1':
26
- try:
27
- from sageattention import sageattn
28
- except ImportError:
29
- raise ImportError('Please install the package "sageattention" to use this USE_SAGEATTN.')
30
- scaled_dot_product_attention = sageattn
31
-
32
-
33
- def attention(q: Tensor, k: Tensor, v: Tensor, **kwargs) -> Tensor:
34
- x = scaled_dot_product_attention(q, k, v)
35
- x = rearrange(x, "B H L D -> B L (H D)")
36
- return x
37
-
38
-
39
- def timestep_embedding(t: Tensor, dim, max_period=10000, time_factor: float = 1000.0):
40
- """
41
- Create sinusoidal timestep embeddings.
42
- :param t: a 1-D Tensor of N indices, one per batch element.
43
- These may be fractional.
44
- :param dim: the dimension of the output.
45
- :param max_period: controls the minimum frequency of the embeddings.
46
- :return: an (N, D) Tensor of positional embeddings.
47
- """
48
- t = time_factor * t
49
- half = dim // 2
50
- freqs = torch.exp(-math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half).to(
51
- t.device
52
- )
53
-
54
- args = t[:, None].float() * freqs[None]
55
- embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)
56
- if dim % 2:
57
- embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)
58
- if torch.is_floating_point(t):
59
- embedding = embedding.to(t)
60
- return embedding
61
-
62
-
63
- class GELU(nn.Module):
64
- def __init__(self, approximate='tanh'):
65
- super().__init__()
66
- self.approximate = approximate
67
-
68
- def forward(self, x: Tensor) -> Tensor:
69
- return nn.functional.gelu(x.contiguous(), approximate=self.approximate)
70
-
71
-
72
- class MLPEmbedder(nn.Module):
73
- def __init__(self, in_dim: int, hidden_dim: int):
74
- super().__init__()
75
- self.in_layer = nn.Linear(in_dim, hidden_dim, bias=True)
76
- self.silu = nn.SiLU()
77
- self.out_layer = nn.Linear(hidden_dim, hidden_dim, bias=True)
78
-
79
- def forward(self, x: Tensor) -> Tensor:
80
- return self.out_layer(self.silu(self.in_layer(x)))
81
-
82
-
83
- class RMSNorm(torch.nn.Module):
84
- def __init__(self, dim: int):
85
- super().__init__()
86
- self.scale = nn.Parameter(torch.ones(dim))
87
-
88
- def forward(self, x: Tensor):
89
- x_dtype = x.dtype
90
- x = x.float()
91
- rrms = torch.rsqrt(torch.mean(x ** 2, dim=-1, keepdim=True) + 1e-6)
92
- return (x * rrms).to(dtype=x_dtype) * self.scale
93
-
94
-
95
- class QKNorm(torch.nn.Module):
96
- def __init__(self, dim: int):
97
- super().__init__()
98
- self.query_norm = RMSNorm(dim)
99
- self.key_norm = RMSNorm(dim)
100
-
101
- def forward(self, q: Tensor, k: Tensor, v: Tensor) -> Tuple[Tensor, Tensor]:
102
- q = self.query_norm(q)
103
- k = self.key_norm(k)
104
- return q.to(v), k.to(v)
105
-
106
-
107
- class SelfAttention(nn.Module):
108
- def __init__(
109
- self,
110
- dim: int,
111
- num_heads: int = 8,
112
- qkv_bias: bool = False,
113
- ):
114
- super().__init__()
115
- self.num_heads = num_heads
116
- head_dim = dim // num_heads
117
-
118
- self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
119
- self.norm = QKNorm(head_dim)
120
- self.proj = nn.Linear(dim, dim)
121
-
122
- def forward(self, x: Tensor, pe: Tensor) -> Tensor:
123
- qkv = self.qkv(x)
124
- q, k, v = rearrange(qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
125
- q, k = self.norm(q, k, v)
126
- x = attention(q, k, v, pe=pe)
127
- x = self.proj(x)
128
- return x
129
-
130
-
131
- @dataclass
132
- class ModulationOut:
133
- shift: Tensor
134
- scale: Tensor
135
- gate: Tensor
136
-
137
-
138
- class Modulation(nn.Module):
139
- def __init__(self, dim: int, double: bool):
140
- super().__init__()
141
- self.is_double = double
142
- self.multiplier = 6 if double else 3
143
- self.lin = nn.Linear(dim, self.multiplier * dim, bias=True)
144
-
145
- def forward(self, vec: Tensor) -> Tuple[ModulationOut, Optional[ModulationOut]]:
146
- out = self.lin(nn.functional.silu(vec))[:, None, :]
147
- out = out.chunk(self.multiplier, dim=-1)
148
-
149
- return (
150
- ModulationOut(*out[:3]),
151
- ModulationOut(*out[3:]) if self.is_double else None,
152
- )
153
-
154
-
155
- class DoubleStreamBlock(nn.Module):
156
- def __init__(
157
- self,
158
- hidden_size: int,
159
- num_heads: int,
160
- mlp_ratio: float,
161
- qkv_bias: bool = False,
162
- ):
163
- super().__init__()
164
- mlp_hidden_dim = int(hidden_size * mlp_ratio)
165
- self.num_heads = num_heads
166
- self.hidden_size = hidden_size
167
- self.img_mod = Modulation(hidden_size, double=True)
168
- self.img_norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
169
- self.img_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias)
170
-
171
- self.img_norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
172
- self.img_mlp = nn.Sequential(
173
- nn.Linear(hidden_size, mlp_hidden_dim, bias=True),
174
- GELU(approximate="tanh"),
175
- nn.Linear(mlp_hidden_dim, hidden_size, bias=True),
176
- )
177
-
178
- self.txt_mod = Modulation(hidden_size, double=True)
179
- self.txt_norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
180
- self.txt_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias)
181
-
182
- self.txt_norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
183
- self.txt_mlp = nn.Sequential(
184
- nn.Linear(hidden_size, mlp_hidden_dim, bias=True),
185
- GELU(approximate="tanh"),
186
- nn.Linear(mlp_hidden_dim, hidden_size, bias=True),
187
- )
188
-
189
- def forward(self, img: Tensor, txt: Tensor, vec: Tensor, pe: Tensor) -> Tuple[Tensor, Tensor]:
190
- img_mod1, img_mod2 = self.img_mod(vec)
191
- txt_mod1, txt_mod2 = self.txt_mod(vec)
192
-
193
- img_modulated = self.img_norm1(img)
194
- img_modulated = (1 + img_mod1.scale) * img_modulated + img_mod1.shift
195
- img_qkv = self.img_attn.qkv(img_modulated)
196
- img_q, img_k, img_v = rearrange(img_qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
197
- img_q, img_k = self.img_attn.norm(img_q, img_k, img_v)
198
-
199
- txt_modulated = self.txt_norm1(txt)
200
- txt_modulated = (1 + txt_mod1.scale) * txt_modulated + txt_mod1.shift
201
- txt_qkv = self.txt_attn.qkv(txt_modulated)
202
- txt_q, txt_k, txt_v = rearrange(txt_qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
203
- txt_q, txt_k = self.txt_attn.norm(txt_q, txt_k, txt_v)
204
-
205
- q = torch.cat((txt_q, img_q), dim=2)
206
- k = torch.cat((txt_k, img_k), dim=2)
207
- v = torch.cat((txt_v, img_v), dim=2)
208
-
209
- attn = attention(q, k, v, pe=pe)
210
- txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1]:]
211
-
212
- img = img + img_mod1.gate * self.img_attn.proj(img_attn)
213
- img = img + img_mod2.gate * self.img_mlp((1 + img_mod2.scale) * self.img_norm2(img) + img_mod2.shift)
214
-
215
- txt = txt + txt_mod1.gate * self.txt_attn.proj(txt_attn)
216
- txt = txt + txt_mod2.gate * self.txt_mlp((1 + txt_mod2.scale) * self.txt_norm2(txt) + txt_mod2.shift)
217
- return img, txt
218
-
219
-
220
- class SingleStreamBlock(nn.Module):
221
- """
222
- A DiT block with parallel linear layers as described in
223
- https://arxiv.org/abs/2302.05442 and adapted modulation interface.
224
- """
225
-
226
- def __init__(
227
- self,
228
- hidden_size: int,
229
- num_heads: int,
230
- mlp_ratio: float = 4.0,
231
- qk_scale: Optional[float] = None,
232
- ):
233
- super().__init__()
234
-
235
- self.hidden_dim = hidden_size
236
- self.num_heads = num_heads
237
- head_dim = hidden_size // num_heads
238
- self.scale = qk_scale or head_dim ** -0.5
239
-
240
- self.mlp_hidden_dim = int(hidden_size * mlp_ratio)
241
- # qkv and mlp_in
242
- self.linear1 = nn.Linear(hidden_size, hidden_size * 3 + self.mlp_hidden_dim)
243
- # proj and mlp_out
244
- self.linear2 = nn.Linear(hidden_size + self.mlp_hidden_dim, hidden_size)
245
-
246
- self.norm = QKNorm(head_dim)
247
-
248
- self.hidden_size = hidden_size
249
- self.pre_norm = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
250
-
251
- self.mlp_act = GELU(approximate="tanh")
252
- self.modulation = Modulation(hidden_size, double=False)
253
-
254
- def forward(self, x: Tensor, vec: Tensor, pe: Tensor) -> Tensor:
255
- mod, _ = self.modulation(vec)
256
-
257
- x_mod = (1 + mod.scale) * self.pre_norm(x) + mod.shift
258
- qkv, mlp = torch.split(self.linear1(x_mod), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1)
259
-
260
- q, k, v = rearrange(qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
261
- q, k = self.norm(q, k, v)
262
-
263
- # compute attention
264
- attn = attention(q, k, v, pe=pe)
265
- # compute activation in mlp stream, cat again and run second linear layer
266
- output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))
267
- return x + mod.gate * output
268
-
269
-
270
- class LastLayer(nn.Module):
271
- def __init__(self, hidden_size: int, patch_size: int, out_channels: int):
272
- super().__init__()
273
- self.norm_final = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
274
- self.linear = nn.Linear(hidden_size, patch_size * patch_size * out_channels, bias=True)
275
- self.adaLN_modulation = nn.Sequential(nn.SiLU(), nn.Linear(hidden_size, 2 * hidden_size, bias=True))
276
-
277
- def forward(self, x: Tensor, vec: Tensor) -> Tensor:
278
- shift, scale = self.adaLN_modulation(vec).chunk(2, dim=1)
279
- x = (1 + scale[:, None, :]) * self.norm_final(x) + shift[:, None, :]
280
- x = self.linear(x)
281
- return x
282
-
283
-
284
- class Hunyuan3DDiT(nn.Module):
285
- def __init__(
286
- self,
287
- in_channels: int = 64,
288
- context_in_dim: int = 1536,
289
- hidden_size: int = 1024,
290
- mlp_ratio: float = 4.0,
291
- num_heads: int = 16,
292
- depth: int = 16,
293
- depth_single_blocks: int = 32,
294
- axes_dim: List[int] = [64],
295
- theta: int = 10_000,
296
- qkv_bias: bool = True,
297
- time_factor: float = 1000,
298
- guidance_embed: bool = False,
299
- ckpt_path: Optional[str] = None,
300
- **kwargs,
301
- ):
302
- super().__init__()
303
- self.in_channels = in_channels
304
- self.context_in_dim = context_in_dim
305
- self.hidden_size = hidden_size
306
- self.mlp_ratio = mlp_ratio
307
- self.num_heads = num_heads
308
- self.depth = depth
309
- self.depth_single_blocks = depth_single_blocks
310
- self.axes_dim = axes_dim
311
- self.theta = theta
312
- self.qkv_bias = qkv_bias
313
- self.time_factor = time_factor
314
- self.out_channels = self.in_channels
315
- self.guidance_embed = guidance_embed
316
-
317
- if hidden_size % num_heads != 0:
318
- raise ValueError(
319
- f"Hidden size {hidden_size} must be divisible by num_heads {num_heads}"
320
- )
321
- pe_dim = hidden_size // num_heads
322
- if sum(axes_dim) != pe_dim:
323
- raise ValueError(f"Got {axes_dim} but expected positional dim {pe_dim}")
324
- self.hidden_size = hidden_size
325
- self.num_heads = num_heads
326
- self.latent_in = nn.Linear(self.in_channels, self.hidden_size, bias=True)
327
- self.time_in = MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size)
328
- self.cond_in = nn.Linear(context_in_dim, self.hidden_size)
329
- self.guidance_in = (
330
- MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size) if guidance_embed else nn.Identity()
331
- )
332
-
333
- self.double_blocks = nn.ModuleList(
334
- [
335
- DoubleStreamBlock(
336
- self.hidden_size,
337
- self.num_heads,
338
- mlp_ratio=mlp_ratio,
339
- qkv_bias=qkv_bias,
340
- )
341
- for _ in range(depth)
342
- ]
343
- )
344
-
345
- self.single_blocks = nn.ModuleList(
346
- [
347
- SingleStreamBlock(
348
- self.hidden_size,
349
- self.num_heads,
350
- mlp_ratio=mlp_ratio,
351
- )
352
- for _ in range(depth_single_blocks)
353
- ]
354
- )
355
-
356
- self.final_layer = LastLayer(self.hidden_size, 1, self.out_channels)
357
-
358
- if ckpt_path is not None:
359
- print('restored denoiser ckpt', ckpt_path)
360
-
361
- ckpt = torch.load(ckpt_path, map_location="cpu")
362
- if 'state_dict' not in ckpt:
363
- # deepspeed ckpt
364
- state_dict = {}
365
- for k in ckpt.keys():
366
- new_k = k.replace('_forward_module.', '')
367
- state_dict[new_k] = ckpt[k]
368
- else:
369
- state_dict = ckpt["state_dict"]
370
-
371
- final_state_dict = {}
372
- for k, v in state_dict.items():
373
- if k.startswith('model.'):
374
- final_state_dict[k.replace('model.', '')] = v
375
- else:
376
- final_state_dict[k] = v
377
- missing, unexpected = self.load_state_dict(final_state_dict, strict=False)
378
- print('unexpected keys:', unexpected)
379
- print('missing keys:', missing)
380
-
381
- def forward(
382
- self,
383
- x,
384
- t,
385
- contexts,
386
- **kwargs,
387
- ) -> Tensor:
388
- cond = contexts['main']
389
- latent = self.latent_in(x)
390
-
391
- vec = self.time_in(timestep_embedding(t, 256, self.time_factor).to(dtype=latent.dtype))
392
- if self.guidance_embed:
393
- guidance = kwargs.get('guidance', None)
394
- if guidance is None:
395
- raise ValueError("Didn't get guidance strength for guidance distilled model.")
396
- vec = vec + self.guidance_in(timestep_embedding(guidance, 256, self.time_factor))
397
-
398
- cond = self.cond_in(cond)
399
- pe = None
400
-
401
- for block in self.double_blocks:
402
- latent, cond = block(img=latent, txt=cond, vec=vec, pe=pe)
403
-
404
- latent = torch.cat((cond, latent), 1)
405
- for block in self.single_blocks:
406
- latent = block(latent, vec=vec, pe=pe)
407
-
408
- latent = latent[:, cond.shape[1]:, ...]
409
- latent = self.final_layer(latent, vec)
410
- return latent
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
imagesuper_utils.py DELETED
@@ -1,34 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import torch
16
- from diffusers import StableDiffusionUpscalePipeline
17
-
18
- class Image_Super_Net():
19
- def __init__(self, config):
20
- self.up_pipeline_x4 = StableDiffusionUpscalePipeline.from_pretrained(
21
- 'stabilityai/stable-diffusion-x4-upscaler',
22
- torch_dtype=torch.float16,
23
- ).to(config.device)
24
- self.up_pipeline_x4.set_progress_bar_config(disable=True)
25
-
26
- def __call__(self, image, prompt=''):
27
- with torch.no_grad():
28
- upscaled_image = self.up_pipeline_x4(
29
- prompt=[prompt],
30
- image=image,
31
- num_inference_steps=5,
32
- ).images[0]
33
-
34
- return upscaled_image
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
io_glb.py DELETED
@@ -1,238 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import base64
16
- import io
17
- import os
18
-
19
- import numpy as np
20
- from PIL import Image as PILImage
21
- from pygltflib import GLTF2
22
- from scipy.spatial.transform import Rotation as R
23
-
24
-
25
- # Function to extract buffer data
26
- def get_buffer_data(gltf, buffer_view):
27
- buffer = gltf.buffers[buffer_view.buffer]
28
- buffer_data = gltf.get_data_from_buffer_uri(buffer.uri)
29
- byte_offset = buffer_view.byteOffset if buffer_view.byteOffset else 0
30
- byte_length = buffer_view.byteLength
31
- return buffer_data[byte_offset:byte_offset + byte_length]
32
-
33
-
34
- # Function to extract attribute data
35
- def get_attribute_data(gltf, accessor_index):
36
- accessor = gltf.accessors[accessor_index]
37
- buffer_view = gltf.bufferViews[accessor.bufferView]
38
- buffer_data = get_buffer_data(gltf, buffer_view)
39
-
40
- comptype = {5120: np.int8, 5121: np.uint8, 5122: np.int16, 5123: np.uint16, 5125: np.uint32, 5126: np.float32}
41
- dtype = comptype[accessor.componentType]
42
-
43
- t2n = {'SCALAR': 1, 'VEC2': 2, 'VEC3': 3, 'VEC4': 4, 'MAT2': 4, 'MAT3': 9, 'MAT4': 16}
44
- num_components = t2n[accessor.type]
45
-
46
- # Calculate the correct slice of data
47
- byte_offset = accessor.byteOffset if accessor.byteOffset else 0
48
- byte_stride = buffer_view.byteStride if buffer_view.byteStride else num_components * np.dtype(dtype).itemsize
49
- count = accessor.count
50
-
51
- # Extract the attribute data
52
- attribute_data = np.zeros((count, num_components), dtype=dtype)
53
- for i in range(count):
54
- start = byte_offset + i * byte_stride
55
- end = start + num_components * np.dtype(dtype).itemsize
56
- attribute_data[i] = np.frombuffer(buffer_data[start:end], dtype=dtype)
57
-
58
- return attribute_data
59
-
60
-
61
- # Function to extract image data
62
- def get_image_data(gltf, image, folder):
63
- if image.uri:
64
- if image.uri.startswith('data:'):
65
- # Data URI
66
- header, encoded = image.uri.split(',', 1)
67
- data = base64.b64decode(encoded)
68
- else:
69
- # External file
70
- fn = image.uri
71
- if not os.path.isabs(fn):
72
- fn = folder + '/' + fn
73
- with open(fn, 'rb') as f:
74
- data = f.read()
75
- else:
76
- buffer_view = gltf.bufferViews[image.bufferView]
77
- data = get_buffer_data(gltf, buffer_view)
78
- return data
79
-
80
-
81
- # Function to convert triangle strip to triangles
82
- def convert_triangle_strip_to_triangles(indices):
83
- triangles = []
84
- for i in range(len(indices) - 2):
85
- if i % 2 == 0:
86
- triangles.append([indices[i], indices[i + 1], indices[i + 2]])
87
- else:
88
- triangles.append([indices[i], indices[i + 2], indices[i + 1]])
89
- return np.array(triangles).reshape(-1, 3)
90
-
91
-
92
- # Function to convert triangle fan to triangles
93
- def convert_triangle_fan_to_triangles(indices):
94
- triangles = []
95
- for i in range(1, len(indices) - 1):
96
- triangles.append([indices[0], indices[i], indices[i + 1]])
97
- return np.array(triangles).reshape(-1, 3)
98
-
99
-
100
- # Function to get the transformation matrix from a node
101
- def get_node_transform(node):
102
- if node.matrix:
103
- return np.array(node.matrix).reshape(4, 4).T
104
- else:
105
- T = np.eye(4)
106
- if node.translation:
107
- T[:3, 3] = node.translation
108
- if node.rotation:
109
- R_mat = R.from_quat(node.rotation).as_matrix()
110
- T[:3, :3] = R_mat
111
- if node.scale:
112
- S = np.diag(node.scale + [1])
113
- T = T @ S
114
- return T
115
-
116
-
117
- def get_world_transform(gltf, node_index, parents, world_transforms):
118
- if parents[node_index] == -2:
119
- return world_transforms[node_index]
120
-
121
- node = gltf.nodes[node_index]
122
- if parents[node_index] == -1:
123
- world_transforms[node_index] = get_node_transform(node)
124
- parents[node_index] = -2
125
- return world_transforms[node_index]
126
-
127
- parent_index = parents[node_index]
128
- parent_transform = get_world_transform(gltf, parent_index, parents, world_transforms)
129
- world_transforms[node_index] = parent_transform @ get_node_transform(node)
130
- parents[node_index] = -2
131
- return world_transforms[node_index]
132
-
133
-
134
- def LoadGlb(path):
135
- # Load the GLB file using pygltflib
136
- gltf = GLTF2().load(path)
137
-
138
- primitives = []
139
- images = {}
140
- # Iterate through the meshes in the GLB file
141
-
142
- world_transforms = [np.identity(4) for i in range(len(gltf.nodes))]
143
- parents = [-1 for i in range(len(gltf.nodes))]
144
- for node_index, node in enumerate(gltf.nodes):
145
- for idx in node.children:
146
- parents[idx] = node_index
147
- # for i in range(len(gltf.nodes)):
148
- # get_world_transform(gltf, i, parents, world_transform)
149
-
150
- for node_index, node in enumerate(gltf.nodes):
151
- if node.mesh is not None:
152
- world_transform = get_world_transform(gltf, node_index, parents, world_transforms)
153
- # Iterate through the primitives in the mesh
154
- mesh = gltf.meshes[node.mesh]
155
- for primitive in mesh.primitives:
156
- # Access the attributes of the primitive
157
- attributes = primitive.attributes.__dict__
158
- mode = primitive.mode if primitive.mode is not None else 4 # Default to TRIANGLES
159
- result = {}
160
- if primitive.indices is not None:
161
- indices = get_attribute_data(gltf, primitive.indices)
162
- if mode == 4: # TRIANGLES
163
- face_indices = indices.reshape(-1, 3)
164
- elif mode == 5: # TRIANGLE_STRIP
165
- face_indices = convert_triangle_strip_to_triangles(indices)
166
- elif mode == 6: # TRIANGLE_FAN
167
- face_indices = convert_triangle_fan_to_triangles(indices)
168
- else:
169
- continue
170
- result['F'] = face_indices
171
-
172
- # Extract vertex positions
173
- if 'POSITION' in attributes and attributes['POSITION'] is not None:
174
- positions = get_attribute_data(gltf, attributes['POSITION'])
175
- # Apply the world transformation to the positions
176
- positions_homogeneous = np.hstack([positions, np.ones((positions.shape[0], 1))])
177
- transformed_positions = (world_transform @ positions_homogeneous.T).T[:, :3]
178
- result['V'] = transformed_positions
179
-
180
- # Extract vertex colors
181
- if 'COLOR_0' in attributes and attributes['COLOR_0'] is not None:
182
- colors = get_attribute_data(gltf, attributes['COLOR_0'])
183
- if colors.shape[-1] > 3:
184
- colors = colors[..., :3]
185
- result['VC'] = colors
186
-
187
- # Extract UVs
188
- if 'TEXCOORD_0' in attributes and not attributes['TEXCOORD_0'] is None:
189
- uvs = get_attribute_data(gltf, attributes['TEXCOORD_0'])
190
- result['UV'] = uvs
191
-
192
- if primitive.material is not None:
193
- material = gltf.materials[primitive.material]
194
- if material.pbrMetallicRoughness is not None and material.pbrMetallicRoughness.baseColorTexture is not None:
195
- texture_index = material.pbrMetallicRoughness.baseColorTexture.index
196
- texture = gltf.textures[texture_index]
197
- image_index = texture.source
198
- if not image_index in images:
199
- image = gltf.images[image_index]
200
- image_data = get_image_data(gltf, image, os.path.dirname(path))
201
- pil_image = PILImage.open(io.BytesIO(image_data))
202
- if pil_image.mode != 'RGB':
203
- pil_image = pil_image.convert('RGB')
204
- images[image_index] = pil_image
205
- result['TEX'] = image_index
206
- elif material.emissiveTexture is not None:
207
- texture_index = material.emissiveTexture.index
208
- texture = gltf.textures[texture_index]
209
- image_index = texture.source
210
- if not image_index in images:
211
- image = gltf.images[image_index]
212
- image_data = get_image_data(gltf, image, os.path.dirname(path))
213
- pil_image = PILImage.open(io.BytesIO(image_data))
214
- if pil_image.mode != 'RGB':
215
- pil_image = pil_image.convert('RGB')
216
- images[image_index] = pil_image
217
- result['TEX'] = image_index
218
- else:
219
- if material.pbrMetallicRoughness is not None:
220
- base_color = material.pbrMetallicRoughness.baseColorFactor
221
- else:
222
- base_color = np.array([0.8, 0.8, 0.8], dtype=np.float32)
223
- result['MC'] = base_color
224
-
225
- primitives.append(result)
226
-
227
- return primitives, images
228
-
229
-
230
- def RotatePrimitives(primitives, transform):
231
- for i in range(len(primitives)):
232
- if 'V' in primitives[i]:
233
- primitives[i]['V'] = primitives[i]['V'] @ transform.T
234
-
235
-
236
- if __name__ == '__main__':
237
- path = 'data/test.glb'
238
- LoadGlb(path)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
io_obj.py DELETED
@@ -1,66 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import cv2
16
- import numpy as np
17
-
18
-
19
- def LoadObj(fn):
20
- lines = [l.strip() for l in open(fn)]
21
- vertices = []
22
- faces = []
23
- for l in lines:
24
- words = [w for w in l.split(' ') if w != '']
25
- if len(words) == 0:
26
- continue
27
- if words[0] == 'v':
28
- v = [float(words[i]) for i in range(1, 4)]
29
- vertices.append(v)
30
- elif words[0] == 'f':
31
- f = [int(words[i]) - 1 for i in range(1, 4)]
32
- faces.append(f)
33
-
34
- return np.array(vertices).astype('float32'), np.array(faces).astype('int32')
35
-
36
-
37
- def LoadObjWithTexture(fn, tex_fn):
38
- lines = [l.strip() for l in open(fn)]
39
- vertices = []
40
- vertex_textures = []
41
- faces = []
42
- face_textures = []
43
- for l in lines:
44
- words = [w for w in l.split(' ') if w != '']
45
- if len(words) == 0:
46
- continue
47
- if words[0] == 'v':
48
- v = [float(words[i]) for i in range(1, len(words))]
49
- vertices.append(v)
50
- elif words[0] == 'vt':
51
- v = [float(words[i]) for i in range(1, len(words))]
52
- vertex_textures.append(v)
53
- elif words[0] == 'f':
54
- f = []
55
- ft = []
56
- for i in range(1, len(words)):
57
- t = words[i].split('/')
58
- f.append(int(t[0]) - 1)
59
- ft.append(int(t[1]) - 1)
60
- for i in range(2, len(f)):
61
- faces.append([f[0], f[i - 1], f[i]])
62
- face_textures.append([ft[0], ft[i - 1], ft[i]])
63
-
64
- tex_image = cv2.cvtColor(cv2.imread(tex_fn), cv2.COLOR_BGR2RGB)
65
- return np.array(vertices).astype('float32'), np.array(vertex_textures).astype('float32'), np.array(faces).astype(
66
- 'int32'), np.array(face_textures).astype('int32'), tex_image
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mesh_processor.cpp DELETED
@@ -1,161 +0,0 @@
1
- #include <vector>
2
- #include <queue>
3
- #include <cmath>
4
- #include <algorithm>
5
- #include <pybind11/pybind11.h>
6
- #include <pybind11/numpy.h>
7
- #include <pybind11/stl.h>
8
-
9
- namespace py = pybind11;
10
- using namespace std;
11
-
12
- std::pair<py::array_t<float>,
13
- py::array_t<uint8_t>> meshVerticeInpaint_smooth(py::array_t<float> texture,
14
- py::array_t<uint8_t> mask,
15
- py::array_t<float> vtx_pos, py::array_t<float> vtx_uv,
16
- py::array_t<int> pos_idx, py::array_t<int> uv_idx) {
17
- auto texture_buf = texture.request();
18
- auto mask_buf = mask.request();
19
- auto vtx_pos_buf = vtx_pos.request();
20
- auto vtx_uv_buf = vtx_uv.request();
21
- auto pos_idx_buf = pos_idx.request();
22
- auto uv_idx_buf = uv_idx.request();
23
-
24
- int texture_height = texture_buf.shape[0];
25
- int texture_width = texture_buf.shape[1];
26
- int texture_channel = texture_buf.shape[2];
27
- float* texture_ptr = static_cast<float*>(texture_buf.ptr);
28
- uint8_t* mask_ptr = static_cast<uint8_t*>(mask_buf.ptr);
29
-
30
- int vtx_num = vtx_pos_buf.shape[0];
31
- float* vtx_pos_ptr = static_cast<float*>(vtx_pos_buf.ptr);
32
- float* vtx_uv_ptr = static_cast<float*>(vtx_uv_buf.ptr);
33
- int* pos_idx_ptr = static_cast<int*>(pos_idx_buf.ptr);
34
- int* uv_idx_ptr = static_cast<int*>(uv_idx_buf.ptr);
35
-
36
- vector<float> vtx_mask(vtx_num, 0.0f);
37
- vector<vector<float>> vtx_color(vtx_num, vector<float>(texture_channel, 0.0f));
38
- vector<int> uncolored_vtxs;
39
-
40
- vector<vector<int>> G(vtx_num);
41
-
42
- for (int i = 0; i < uv_idx_buf.shape[0]; ++i) {
43
- for (int k = 0; k < 3; ++k) {
44
- int vtx_uv_idx = uv_idx_ptr[i * 3 + k];
45
- int vtx_idx = pos_idx_ptr[i * 3 + k];
46
- int uv_v = round(vtx_uv_ptr[vtx_uv_idx * 2] * (texture_width - 1));
47
- int uv_u = round((1.0 - vtx_uv_ptr[vtx_uv_idx * 2 + 1]) * (texture_height - 1));
48
-
49
- if (mask_ptr[uv_u * texture_width + uv_v] > 0) {
50
- vtx_mask[vtx_idx] = 1.0f;
51
- for (int c = 0; c < texture_channel; ++c) {
52
- vtx_color[vtx_idx][c] = texture_ptr[(uv_u * texture_width + uv_v) * texture_channel + c];
53
- }
54
- }else{
55
- uncolored_vtxs.push_back(vtx_idx);
56
- }
57
-
58
- G[pos_idx_ptr[i * 3 + k]].push_back(pos_idx_ptr[i * 3 + (k + 1) % 3]);
59
- }
60
- }
61
-
62
- int smooth_count = 2;
63
- int last_uncolored_vtx_count = 0;
64
- while (smooth_count>0) {
65
- int uncolored_vtx_count = 0;
66
-
67
- for (int vtx_idx : uncolored_vtxs) {
68
-
69
- vector<float> sum_color(texture_channel, 0.0f);
70
- float total_weight = 0.0f;
71
-
72
- array<float, 3> vtx_0 = {vtx_pos_ptr[vtx_idx * 3],
73
- vtx_pos_ptr[vtx_idx * 3 + 1], vtx_pos_ptr[vtx_idx * 3 + 2]};
74
- for (int connected_idx : G[vtx_idx]) {
75
- if (vtx_mask[connected_idx] > 0) {
76
- array<float, 3> vtx1 = {vtx_pos_ptr[connected_idx * 3],
77
- vtx_pos_ptr[connected_idx * 3 + 1], vtx_pos_ptr[connected_idx * 3 + 2]};
78
- float dist_weight = 1.0f / max(sqrt(pow(vtx_0[0] - vtx1[0], 2) + pow(vtx_0[1] - vtx1[1], 2) + \
79
- pow(vtx_0[2] - vtx1[2], 2)), 1E-4);
80
- dist_weight = dist_weight * dist_weight;
81
- for (int c = 0; c < texture_channel; ++c) {
82
- sum_color[c] += vtx_color[connected_idx][c] * dist_weight;
83
- }
84
- total_weight += dist_weight;
85
- }
86
- }
87
-
88
- if (total_weight > 0.0f) {
89
- for (int c = 0; c < texture_channel; ++c) {
90
- vtx_color[vtx_idx][c] = sum_color[c] / total_weight;
91
- }
92
- vtx_mask[vtx_idx] = 1.0f;
93
- } else {
94
- uncolored_vtx_count++;
95
- }
96
-
97
- }
98
-
99
- if(last_uncolored_vtx_count==uncolored_vtx_count){
100
- smooth_count--;
101
- }else{
102
- smooth_count++;
103
- }
104
- last_uncolored_vtx_count = uncolored_vtx_count;
105
- }
106
-
107
- // Create new arrays for the output
108
- py::array_t<float> new_texture(texture_buf.size);
109
- py::array_t<uint8_t> new_mask(mask_buf.size);
110
-
111
- auto new_texture_buf = new_texture.request();
112
- auto new_mask_buf = new_mask.request();
113
-
114
- float* new_texture_ptr = static_cast<float*>(new_texture_buf.ptr);
115
- uint8_t* new_mask_ptr = static_cast<uint8_t*>(new_mask_buf.ptr);
116
- // Copy original texture and mask to new arrays
117
- std::copy(texture_ptr, texture_ptr + texture_buf.size, new_texture_ptr);
118
- std::copy(mask_ptr, mask_ptr + mask_buf.size, new_mask_ptr);
119
-
120
- for (int face_idx = 0; face_idx < uv_idx_buf.shape[0]; ++face_idx) {
121
- for (int k = 0; k < 3; ++k) {
122
- int vtx_uv_idx = uv_idx_ptr[face_idx * 3 + k];
123
- int vtx_idx = pos_idx_ptr[face_idx * 3 + k];
124
-
125
- if (vtx_mask[vtx_idx] == 1.0f) {
126
- int uv_v = round(vtx_uv_ptr[vtx_uv_idx * 2] * (texture_width - 1));
127
- int uv_u = round((1.0 - vtx_uv_ptr[vtx_uv_idx * 2 + 1]) * (texture_height - 1));
128
-
129
- for (int c = 0; c < texture_channel; ++c) {
130
- new_texture_ptr[(uv_u * texture_width + uv_v) * texture_channel + c] = vtx_color[vtx_idx][c];
131
- }
132
- new_mask_ptr[uv_u * texture_width + uv_v] = 255;
133
- }
134
- }
135
- }
136
-
137
- // Reshape the new arrays to match the original texture and mask shapes
138
- new_texture.resize({texture_height, texture_width, 3});
139
- new_mask.resize({texture_height, texture_width});
140
- return std::make_pair(new_texture, new_mask);
141
- }
142
-
143
-
144
- std::pair<py::array_t<float>, py::array_t<uint8_t>> meshVerticeInpaint(py::array_t<float> texture,
145
- py::array_t<uint8_t> mask,
146
- py::array_t<float> vtx_pos, py::array_t<float> vtx_uv,
147
- py::array_t<int> pos_idx, py::array_t<int> uv_idx, const std::string& method = "smooth") {
148
- if (method == "smooth") {
149
- return meshVerticeInpaint_smooth(texture, mask, vtx_pos, vtx_uv, pos_idx, uv_idx);
150
- } else {
151
- throw std::invalid_argument("Invalid method. Use 'smooth' or 'forward'.");
152
- }
153
- }
154
-
155
- PYBIND11_MODULE(mesh_processor, m) {
156
- m.def("meshVerticeInpaint", &meshVerticeInpaint, "A function to process mesh",
157
- py::arg("texture"), py::arg("mask"),
158
- py::arg("vtx_pos"), py::arg("vtx_uv"),
159
- py::arg("pos_idx"), py::arg("uv_idx"),
160
- py::arg("method") = "smooth");
161
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mesh_processor.py DELETED
@@ -1,84 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import numpy as np
16
-
17
- def meshVerticeInpaint_smooth(texture, mask, vtx_pos, vtx_uv, pos_idx, uv_idx):
18
- texture_height, texture_width, texture_channel = texture.shape
19
- vtx_num = vtx_pos.shape[0]
20
-
21
- vtx_mask = np.zeros(vtx_num, dtype=np.float32)
22
- vtx_color = [np.zeros(texture_channel, dtype=np.float32) for _ in range(vtx_num)]
23
- uncolored_vtxs = []
24
- G = [[] for _ in range(vtx_num)]
25
-
26
- for i in range(uv_idx.shape[0]):
27
- for k in range(3):
28
- vtx_uv_idx = uv_idx[i, k]
29
- vtx_idx = pos_idx[i, k]
30
- uv_v = int(round(vtx_uv[vtx_uv_idx, 0] * (texture_width - 1)))
31
- uv_u = int(round((1.0 - vtx_uv[vtx_uv_idx, 1]) * (texture_height - 1)))
32
- if mask[uv_u, uv_v] > 0:
33
- vtx_mask[vtx_idx] = 1.0
34
- vtx_color[vtx_idx] = texture[uv_u, uv_v]
35
- else:
36
- uncolored_vtxs.append(vtx_idx)
37
- G[pos_idx[i, k]].append(pos_idx[i, (k + 1) % 3])
38
-
39
- smooth_count = 2
40
- last_uncolored_vtx_count = 0
41
- while smooth_count > 0:
42
- uncolored_vtx_count = 0
43
- for vtx_idx in uncolored_vtxs:
44
- sum_color = np.zeros(texture_channel, dtype=np.float32)
45
- total_weight = 0.0
46
- vtx_0 = vtx_pos[vtx_idx]
47
- for connected_idx in G[vtx_idx]:
48
- if vtx_mask[connected_idx] > 0:
49
- vtx1 = vtx_pos[connected_idx]
50
- dist = np.sqrt(np.sum((vtx_0 - vtx1) ** 2))
51
- dist_weight = 1.0 / max(dist, 1e-4)
52
- dist_weight *= dist_weight
53
- sum_color += vtx_color[connected_idx] * dist_weight
54
- total_weight += dist_weight
55
- if total_weight > 0:
56
- vtx_color[vtx_idx] = sum_color / total_weight
57
- vtx_mask[vtx_idx] = 1.0
58
- else:
59
- uncolored_vtx_count += 1
60
-
61
- if last_uncolored_vtx_count == uncolored_vtx_count:
62
- smooth_count -= 1
63
- else:
64
- smooth_count += 1
65
- last_uncolored_vtx_count = uncolored_vtx_count
66
-
67
- new_texture = texture.copy()
68
- new_mask = mask.copy()
69
- for face_idx in range(uv_idx.shape[0]):
70
- for k in range(3):
71
- vtx_uv_idx = uv_idx[face_idx, k]
72
- vtx_idx = pos_idx[face_idx, k]
73
- if vtx_mask[vtx_idx] == 1.0:
74
- uv_v = int(round(vtx_uv[vtx_uv_idx, 0] * (texture_width - 1)))
75
- uv_u = int(round((1.0 - vtx_uv[vtx_uv_idx, 1]) * (texture_height - 1)))
76
- new_texture[uv_u, uv_v] = vtx_color[vtx_idx]
77
- new_mask[uv_u, uv_v] = 255
78
- return new_texture, new_mask
79
-
80
- def meshVerticeInpaint(texture, mask, vtx_pos, vtx_uv, pos_idx, uv_idx, method="smooth"):
81
- if method == "smooth":
82
- return meshVerticeInpaint_smooth(texture, mask, vtx_pos, vtx_uv, pos_idx, uv_idx)
83
- else:
84
- raise ValueError("Invalid method. Use 'smooth' or 'forward'.")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mesh_render.py DELETED
@@ -1,823 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import cv2
16
- import numpy as np
17
- import torch
18
- import torch.nn.functional as F
19
- import trimesh
20
- from PIL import Image
21
-
22
- from .camera_utils import (
23
- transform_pos,
24
- get_mv_matrix,
25
- get_orthographic_projection_matrix,
26
- get_perspective_projection_matrix,
27
- )
28
- from .mesh_processor import meshVerticeInpaint
29
- from .mesh_utils import load_mesh, save_mesh
30
-
31
-
32
- def stride_from_shape(shape):
33
- stride = [1]
34
- for x in reversed(shape[1:]):
35
- stride.append(stride[-1] * x)
36
- return list(reversed(stride))
37
-
38
-
39
- def scatter_add_nd_with_count(input, count, indices, values, weights=None):
40
- # input: [..., C], D dimension + C channel
41
- # count: [..., 1], D dimension
42
- # indices: [N, D], long
43
- # values: [N, C]
44
-
45
- D = indices.shape[-1]
46
- C = input.shape[-1]
47
- size = input.shape[:-1]
48
- stride = stride_from_shape(size)
49
-
50
- assert len(size) == D
51
-
52
- input = input.view(-1, C) # [HW, C]
53
- count = count.view(-1, 1)
54
-
55
- flatten_indices = (indices * torch.tensor(stride,
56
- dtype=torch.long, device=indices.device)).sum(-1) # [N]
57
-
58
- if weights is None:
59
- weights = torch.ones_like(values[..., :1])
60
-
61
- input.scatter_add_(0, flatten_indices.unsqueeze(1).repeat(1, C), values)
62
- count.scatter_add_(0, flatten_indices.unsqueeze(1), weights)
63
-
64
- return input.view(*size, C), count.view(*size, 1)
65
-
66
-
67
- def linear_grid_put_2d(H, W, coords, values, return_count=False):
68
- # coords: [N, 2], float in [0, 1]
69
- # values: [N, C]
70
-
71
- C = values.shape[-1]
72
-
73
- indices = coords * torch.tensor(
74
- [H - 1, W - 1], dtype=torch.float32, device=coords.device
75
- )
76
- indices_00 = indices.floor().long() # [N, 2]
77
- indices_00[:, 0].clamp_(0, H - 2)
78
- indices_00[:, 1].clamp_(0, W - 2)
79
- indices_01 = indices_00 + torch.tensor(
80
- [0, 1], dtype=torch.long, device=indices.device
81
- )
82
- indices_10 = indices_00 + torch.tensor(
83
- [1, 0], dtype=torch.long, device=indices.device
84
- )
85
- indices_11 = indices_00 + torch.tensor(
86
- [1, 1], dtype=torch.long, device=indices.device
87
- )
88
-
89
- h = indices[..., 0] - indices_00[..., 0].float()
90
- w = indices[..., 1] - indices_00[..., 1].float()
91
- w_00 = (1 - h) * (1 - w)
92
- w_01 = (1 - h) * w
93
- w_10 = h * (1 - w)
94
- w_11 = h * w
95
-
96
- result = torch.zeros(H, W, C, device=values.device,
97
- dtype=values.dtype) # [H, W, C]
98
- count = torch.zeros(H, W, 1, device=values.device,
99
- dtype=values.dtype) # [H, W, 1]
100
- weights = torch.ones_like(values[..., :1]) # [N, 1]
101
-
102
- result, count = scatter_add_nd_with_count(
103
- result, count, indices_00, values * w_00.unsqueeze(1), weights * w_00.unsqueeze(1))
104
- result, count = scatter_add_nd_with_count(
105
- result, count, indices_01, values * w_01.unsqueeze(1), weights * w_01.unsqueeze(1))
106
- result, count = scatter_add_nd_with_count(
107
- result, count, indices_10, values * w_10.unsqueeze(1), weights * w_10.unsqueeze(1))
108
- result, count = scatter_add_nd_with_count(
109
- result, count, indices_11, values * w_11.unsqueeze(1), weights * w_11.unsqueeze(1))
110
-
111
- if return_count:
112
- return result, count
113
-
114
- mask = (count.squeeze(-1) > 0)
115
- result[mask] = result[mask] / count[mask].repeat(1, C)
116
-
117
- return result
118
-
119
-
120
- class MeshRender():
121
- def __init__(
122
- self,
123
- camera_distance=1.45, camera_type='orth',
124
- default_resolution=1024, texture_size=1024,
125
- use_antialias=True, max_mip_level=None, filter_mode='linear',
126
- bake_mode='linear', raster_mode='cr', device='cuda'):
127
-
128
- self.device = device
129
-
130
- self.set_default_render_resolution(default_resolution)
131
- self.set_default_texture_resolution(texture_size)
132
-
133
- self.camera_distance = camera_distance
134
- self.use_antialias = use_antialias
135
- self.max_mip_level = max_mip_level
136
- self.filter_mode = filter_mode
137
-
138
- self.bake_angle_thres = 75
139
- self.bake_unreliable_kernel_size = int(
140
- (2 / 512) * max(self.default_resolution[0], self.default_resolution[1]))
141
- self.bake_mode = bake_mode
142
-
143
- self.raster_mode = raster_mode
144
- if self.raster_mode == 'cr':
145
- import custom_rasterizer as cr
146
- self.raster = cr
147
- else:
148
- raise f'No raster named {self.raster_mode}'
149
-
150
- if camera_type == 'orth':
151
- self.ortho_scale = 1.2
152
- self.camera_proj_mat = get_orthographic_projection_matrix(
153
- left=-self.ortho_scale * 0.5, right=self.ortho_scale * 0.5,
154
- bottom=-self.ortho_scale * 0.5, top=self.ortho_scale * 0.5,
155
- near=0.1, far=100
156
- )
157
- elif camera_type == 'perspective':
158
- self.camera_proj_mat = get_perspective_projection_matrix(
159
- 49.13, self.default_resolution[1] / self.default_resolution[0],
160
- 0.01, 100.0
161
- )
162
- else:
163
- raise f'No camera type {camera_type}'
164
-
165
- def raster_rasterize(self, pos, tri, resolution, ranges=None, grad_db=True):
166
-
167
- if self.raster_mode == 'cr':
168
- rast_out_db = None
169
- if pos.dim() == 2:
170
- pos = pos.unsqueeze(0)
171
- findices, barycentric = self.raster.rasterize(pos, tri, resolution)
172
- rast_out = torch.cat((barycentric, findices.unsqueeze(-1)), dim=-1)
173
- rast_out = rast_out.unsqueeze(0)
174
- else:
175
- raise f'No raster named {self.raster_mode}'
176
-
177
- return rast_out, rast_out_db
178
-
179
- def raster_interpolate(self, uv, rast_out, uv_idx, rast_db=None, diff_attrs=None):
180
-
181
- if self.raster_mode == 'cr':
182
- textd = None
183
- barycentric = rast_out[0, ..., :-1]
184
- findices = rast_out[0, ..., -1]
185
- if uv.dim() == 2:
186
- uv = uv.unsqueeze(0)
187
- textc = self.raster.interpolate(uv, findices, barycentric, uv_idx)
188
- else:
189
- raise f'No raster named {self.raster_mode}'
190
-
191
- return textc, textd
192
-
193
- def raster_texture(self, tex, uv, uv_da=None, mip_level_bias=None, mip=None, filter_mode='auto',
194
- boundary_mode='wrap', max_mip_level=None):
195
-
196
- if self.raster_mode == 'cr':
197
- raise f'Texture is not implemented in cr'
198
- else:
199
- raise f'No raster named {self.raster_mode}'
200
-
201
- return color
202
-
203
- def raster_antialias(self, color, rast, pos, tri, topology_hash=None, pos_gradient_boost=1.0):
204
-
205
- if self.raster_mode == 'cr':
206
- # Antialias has not been supported yet
207
- color = color
208
- else:
209
- raise f'No raster named {self.raster_mode}'
210
-
211
- return color
212
-
213
- def load_mesh(
214
- self,
215
- mesh,
216
- scale_factor=1.15,
217
- auto_center=True,
218
- ):
219
- vtx_pos, pos_idx, vtx_uv, uv_idx, texture_data = load_mesh(mesh)
220
- self.mesh_copy = mesh
221
- self.set_mesh(vtx_pos, pos_idx,
222
- vtx_uv=vtx_uv, uv_idx=uv_idx,
223
- scale_factor=scale_factor, auto_center=auto_center
224
- )
225
- if texture_data is not None:
226
- self.set_texture(texture_data)
227
-
228
- def save_mesh(self):
229
- texture_data = self.get_texture()
230
- texture_data = Image.fromarray((texture_data * 255).astype(np.uint8))
231
- return save_mesh(self.mesh_copy, texture_data)
232
-
233
- def set_mesh(
234
- self,
235
- vtx_pos, pos_idx,
236
- vtx_uv=None, uv_idx=None,
237
- scale_factor=1.15, auto_center=True
238
- ):
239
-
240
- self.vtx_pos = torch.from_numpy(vtx_pos).to(self.device).float()
241
- self.pos_idx = torch.from_numpy(pos_idx).to(self.device).to(torch.int)
242
- if (vtx_uv is not None) and (uv_idx is not None):
243
- self.vtx_uv = torch.from_numpy(vtx_uv).to(self.device).float()
244
- self.uv_idx = torch.from_numpy(uv_idx).to(self.device).to(torch.int)
245
- else:
246
- self.vtx_uv = None
247
- self.uv_idx = None
248
-
249
- self.vtx_pos[:, [0, 1]] = -self.vtx_pos[:, [0, 1]]
250
- self.vtx_pos[:, [1, 2]] = self.vtx_pos[:, [2, 1]]
251
- if (vtx_uv is not None) and (uv_idx is not None):
252
- self.vtx_uv[:, 1] = 1.0 - self.vtx_uv[:, 1]
253
-
254
- if auto_center:
255
- max_bb = (self.vtx_pos - 0).max(0)[0]
256
- min_bb = (self.vtx_pos - 0).min(0)[0]
257
- center = (max_bb + min_bb) / 2
258
- scale = torch.norm(self.vtx_pos - center, dim=1).max() * 2.0
259
- self.vtx_pos = (self.vtx_pos - center) * \
260
- (scale_factor / float(scale))
261
- self.scale_factor = scale_factor
262
-
263
- def set_texture(self, tex):
264
- if isinstance(tex, np.ndarray):
265
- tex = Image.fromarray((tex * 255).astype(np.uint8))
266
- elif isinstance(tex, torch.Tensor):
267
- tex = tex.cpu().numpy()
268
- tex = Image.fromarray((tex * 255).astype(np.uint8))
269
-
270
- tex = tex.resize(self.texture_size).convert('RGB')
271
- tex = np.array(tex) / 255.0
272
- self.tex = torch.from_numpy(tex).to(self.device)
273
- self.tex = self.tex.float()
274
-
275
- def set_default_render_resolution(self, default_resolution):
276
- if isinstance(default_resolution, int):
277
- default_resolution = (default_resolution, default_resolution)
278
- self.default_resolution = default_resolution
279
-
280
- def set_default_texture_resolution(self, texture_size):
281
- if isinstance(texture_size, int):
282
- texture_size = (texture_size, texture_size)
283
- self.texture_size = texture_size
284
-
285
- def get_mesh(self):
286
- vtx_pos = self.vtx_pos.cpu().numpy()
287
- pos_idx = self.pos_idx.cpu().numpy()
288
- vtx_uv = self.vtx_uv.cpu().numpy()
289
- uv_idx = self.uv_idx.cpu().numpy()
290
-
291
- # 坐标变换的逆变换
292
- vtx_pos[:, [1, 2]] = vtx_pos[:, [2, 1]]
293
- vtx_pos[:, [0, 1]] = -vtx_pos[:, [0, 1]]
294
-
295
- vtx_uv[:, 1] = 1.0 - vtx_uv[:, 1]
296
- return vtx_pos, pos_idx, vtx_uv, uv_idx
297
-
298
- def get_texture(self):
299
- return self.tex.cpu().numpy()
300
-
301
- def to(self, device):
302
- self.device = device
303
-
304
- for attr_name in dir(self):
305
- attr_value = getattr(self, attr_name)
306
- if isinstance(attr_value, torch.Tensor):
307
- setattr(self, attr_name, attr_value.to(self.device))
308
-
309
- def color_rgb_to_srgb(self, image):
310
- if isinstance(image, Image.Image):
311
- image_rgb = torch.tesnor(
312
- np.array(image) /
313
- 255.0).float().to(
314
- self.device)
315
- elif isinstance(image, np.ndarray):
316
- image_rgb = torch.tensor(image).float()
317
- else:
318
- image_rgb = image.to(self.device)
319
-
320
- image_srgb = torch.where(
321
- image_rgb <= 0.0031308,
322
- 12.92 * image_rgb,
323
- 1.055 * torch.pow(image_rgb, 1 / 2.4) - 0.055
324
- )
325
-
326
- if isinstance(image, Image.Image):
327
- image_srgb = Image.fromarray(
328
- (image_srgb.cpu().numpy() *
329
- 255).astype(
330
- np.uint8))
331
- elif isinstance(image, np.ndarray):
332
- image_srgb = image_srgb.cpu().numpy()
333
- else:
334
- image_srgb = image_srgb.to(image.device)
335
-
336
- return image_srgb
337
-
338
- def _render(
339
- self,
340
- glctx,
341
- mvp,
342
- pos,
343
- pos_idx,
344
- uv,
345
- uv_idx,
346
- tex,
347
- resolution,
348
- max_mip_level,
349
- keep_alpha,
350
- filter_mode
351
- ):
352
- pos_clip = transform_pos(mvp, pos)
353
- if isinstance(resolution, (int, float)):
354
- resolution = [resolution, resolution]
355
- rast_out, rast_out_db = self.raster_rasterize(
356
- glctx, pos_clip, pos_idx, resolution=resolution)
357
-
358
- tex = tex.contiguous()
359
- if filter_mode == 'linear-mipmap-linear':
360
- texc, texd = self.raster_interpolate(
361
- uv[None, ...], rast_out, uv_idx, rast_db=rast_out_db, diff_attrs='all')
362
- color = self.raster_texture(
363
- tex[None, ...], texc, texd, filter_mode='linear-mipmap-linear', max_mip_level=max_mip_level)
364
- else:
365
- texc, _ = self.raster_interpolate(uv[None, ...], rast_out, uv_idx)
366
- color = self.raster_texture(tex[None, ...], texc, filter_mode=filter_mode)
367
-
368
- visible_mask = torch.clamp(rast_out[..., -1:], 0, 1)
369
- color = color * visible_mask # Mask out background.
370
- if self.use_antialias:
371
- color = self.raster_antialias(color, rast_out, pos_clip, pos_idx)
372
-
373
- if keep_alpha:
374
- color = torch.cat([color, visible_mask], dim=-1)
375
- return color[0, ...]
376
-
377
- def render(
378
- self,
379
- elev,
380
- azim,
381
- camera_distance=None,
382
- center=None,
383
- resolution=None,
384
- tex=None,
385
- keep_alpha=True,
386
- bgcolor=None,
387
- filter_mode=None,
388
- return_type='th'
389
- ):
390
-
391
- proj = self.camera_proj_mat
392
- r_mv = get_mv_matrix(
393
- elev=elev,
394
- azim=azim,
395
- camera_distance=self.camera_distance if camera_distance is None else camera_distance,
396
- center=center)
397
- r_mvp = np.matmul(proj, r_mv).astype(np.float32)
398
- if tex is not None:
399
- if isinstance(tex, Image.Image):
400
- tex = torch.tensor(np.array(tex) / 255.0)
401
- elif isinstance(tex, np.ndarray):
402
- tex = torch.tensor(tex)
403
- if tex.dim() == 2:
404
- tex = tex.unsqueeze(-1)
405
- tex = tex.float().to(self.device)
406
- image = self._render(r_mvp, self.vtx_pos, self.pos_idx, self.vtx_uv, self.uv_idx,
407
- self.tex if tex is None else tex,
408
- self.default_resolution if resolution is None else resolution,
409
- self.max_mip_level, True, filter_mode if filter_mode else self.filter_mode)
410
- mask = (image[..., [-1]] == 1).float()
411
- if bgcolor is None:
412
- bgcolor = [0 for _ in range(image.shape[-1] - 1)]
413
- image = image * mask + (1 - mask) * \
414
- torch.tensor(bgcolor + [0]).to(self.device)
415
- if keep_alpha == False:
416
- image = image[..., :-1]
417
- if return_type == 'np':
418
- image = image.cpu().numpy()
419
- elif return_type == 'pl':
420
- image = image.squeeze(-1).cpu().numpy() * 255
421
- image = Image.fromarray(image.astype(np.uint8))
422
- return image
423
-
424
- def render_normal(
425
- self,
426
- elev,
427
- azim,
428
- camera_distance=None,
429
- center=None,
430
- resolution=None,
431
- bg_color=[1, 1, 1],
432
- use_abs_coor=False,
433
- normalize_rgb=True,
434
- return_type='th'
435
- ):
436
-
437
- pos_camera, pos_clip = self.get_pos_from_mvp(elev, azim, camera_distance, center)
438
- if resolution is None:
439
- resolution = self.default_resolution
440
- if isinstance(resolution, (int, float)):
441
- resolution = [resolution, resolution]
442
- rast_out, rast_out_db = self.raster_rasterize(
443
- pos_clip, self.pos_idx, resolution=resolution)
444
-
445
- if use_abs_coor:
446
- mesh_triangles = self.vtx_pos[self.pos_idx[:, :3], :]
447
- else:
448
- pos_camera = pos_camera[:, :3] / pos_camera[:, 3:4]
449
- mesh_triangles = pos_camera[self.pos_idx[:, :3], :]
450
- face_normals = F.normalize(
451
- torch.cross(mesh_triangles[:,
452
- 1,
453
- :] - mesh_triangles[:,
454
- 0,
455
- :],
456
- mesh_triangles[:,
457
- 2,
458
- :] - mesh_triangles[:,
459
- 0,
460
- :],
461
- dim=-1),
462
- dim=-1)
463
-
464
- vertex_normals = trimesh.geometry.mean_vertex_normals(vertex_count=self.vtx_pos.shape[0],
465
- faces=self.pos_idx.cpu(),
466
- face_normals=face_normals.cpu(), )
467
- vertex_normals = torch.from_numpy(
468
- vertex_normals).float().to(self.device).contiguous()
469
-
470
- # Interpolate normal values across the rasterized pixels
471
- normal, _ = self.raster_interpolate(
472
- vertex_normals[None, ...], rast_out, self.pos_idx)
473
-
474
- visible_mask = torch.clamp(rast_out[..., -1:], 0, 1)
475
- normal = normal * visible_mask + \
476
- torch.tensor(bg_color, dtype=torch.float32, device=self.device) * (1 -
477
- visible_mask) # Mask out background.
478
-
479
- if normalize_rgb:
480
- normal = (normal + 1) * 0.5
481
- if self.use_antialias:
482
- normal = self.raster_antialias(normal, rast_out, pos_clip, self.pos_idx)
483
-
484
- image = normal[0, ...]
485
- if return_type == 'np':
486
- image = image.cpu().numpy()
487
- elif return_type == 'pl':
488
- image = image.cpu().numpy() * 255
489
- image = Image.fromarray(image.astype(np.uint8))
490
-
491
- return image
492
-
493
- def convert_normal_map(self, image):
494
- # blue is front, red is left, green is top
495
- if isinstance(image, Image.Image):
496
- image = np.array(image)
497
- mask = (image == [255, 255, 255]).all(axis=-1)
498
-
499
- image = (image / 255.0) * 2.0 - 1.0
500
-
501
- image[..., [1]] = -image[..., [1]]
502
- image[..., [1, 2]] = image[..., [2, 1]]
503
- image[..., [0]] = -image[..., [0]]
504
-
505
- image = (image + 1.0) * 0.5
506
-
507
- image = (image * 255).astype(np.uint8)
508
- image[mask] = [127, 127, 255]
509
-
510
- return Image.fromarray(image)
511
-
512
- def get_pos_from_mvp(self, elev, azim, camera_distance, center):
513
- proj = self.camera_proj_mat
514
- r_mv = get_mv_matrix(
515
- elev=elev,
516
- azim=azim,
517
- camera_distance=self.camera_distance if camera_distance is None else camera_distance,
518
- center=center)
519
-
520
- pos_camera = transform_pos(r_mv, self.vtx_pos, keepdim=True)
521
- pos_clip = transform_pos(proj, pos_camera)
522
-
523
- return pos_camera, pos_clip
524
-
525
- def render_depth(
526
- self,
527
- elev,
528
- azim,
529
- camera_distance=None,
530
- center=None,
531
- resolution=None,
532
- return_type='th'
533
- ):
534
- pos_camera, pos_clip = self.get_pos_from_mvp(elev, azim, camera_distance, center)
535
-
536
- if resolution is None:
537
- resolution = self.default_resolution
538
- if isinstance(resolution, (int, float)):
539
- resolution = [resolution, resolution]
540
- rast_out, rast_out_db = self.raster_rasterize(
541
- pos_clip, self.pos_idx, resolution=resolution)
542
-
543
- pos_camera = pos_camera[:, :3] / pos_camera[:, 3:4]
544
- tex_depth = pos_camera[:, 2].reshape(1, -1, 1).contiguous()
545
-
546
- # Interpolate depth values across the rasterized pixels
547
- depth, _ = self.raster_interpolate(tex_depth, rast_out, self.pos_idx)
548
-
549
- visible_mask = torch.clamp(rast_out[..., -1:], 0, 1)
550
- depth_max, depth_min = depth[visible_mask >
551
- 0].max(), depth[visible_mask > 0].min()
552
- depth = (depth - depth_min) / (depth_max - depth_min)
553
-
554
- depth = depth * visible_mask # Mask out background.
555
- if self.use_antialias:
556
- depth = self.raster_antialias(depth, rast_out, pos_clip, self.pos_idx)
557
-
558
- image = depth[0, ...]
559
- if return_type == 'np':
560
- image = image.cpu().numpy()
561
- elif return_type == 'pl':
562
- image = image.squeeze(-1).cpu().numpy() * 255
563
- image = Image.fromarray(image.astype(np.uint8))
564
- return image
565
-
566
- def render_position(self, elev, azim, camera_distance=None, center=None,
567
- resolution=None, bg_color=[1, 1, 1], return_type='th'):
568
- pos_camera, pos_clip = self.get_pos_from_mvp(elev, azim, camera_distance, center)
569
- if resolution is None:
570
- resolution = self.default_resolution
571
- if isinstance(resolution, (int, float)):
572
- resolution = [resolution, resolution]
573
- rast_out, rast_out_db = self.raster_rasterize(
574
- pos_clip, self.pos_idx, resolution=resolution)
575
-
576
- tex_position = 0.5 - self.vtx_pos[:, :3] / self.scale_factor
577
- tex_position = tex_position.contiguous()
578
-
579
- # Interpolate depth values across the rasterized pixels
580
- position, _ = self.raster_interpolate(
581
- tex_position[None, ...], rast_out, self.pos_idx)
582
-
583
- visible_mask = torch.clamp(rast_out[..., -1:], 0, 1)
584
-
585
- position = position * visible_mask + \
586
- torch.tensor(bg_color, dtype=torch.float32, device=self.device) * (1 -
587
- visible_mask) # Mask out background.
588
- if self.use_antialias:
589
- position = self.raster_antialias(position, rast_out, pos_clip, self.pos_idx)
590
-
591
- image = position[0, ...]
592
-
593
- if return_type == 'np':
594
- image = image.cpu().numpy()
595
- elif return_type == 'pl':
596
- image = image.squeeze(-1).cpu().numpy() * 255
597
- image = Image.fromarray(image.astype(np.uint8))
598
- return image
599
-
600
- def render_uvpos(self, return_type='th'):
601
- image = self.uv_feature_map(self.vtx_pos * 0.5 + 0.5)
602
- if return_type == 'np':
603
- image = image.cpu().numpy()
604
- elif return_type == 'pl':
605
- image = image.cpu().numpy() * 255
606
- image = Image.fromarray(image.astype(np.uint8))
607
- return image
608
-
609
- def uv_feature_map(self, vert_feat, bg=None):
610
- vtx_uv = self.vtx_uv * 2 - 1.0
611
- vtx_uv = torch.cat(
612
- [vtx_uv, torch.zeros_like(self.vtx_uv)], dim=1).unsqueeze(0)
613
- vtx_uv[..., -1] = 1
614
- uv_idx = self.uv_idx
615
- rast_out, rast_out_db = self.raster_rasterize(
616
- vtx_uv, uv_idx, resolution=self.texture_size)
617
- feat_map, _ = self.raster_interpolate(vert_feat[None, ...], rast_out, uv_idx)
618
- feat_map = feat_map[0, ...]
619
- if bg is not None:
620
- visible_mask = torch.clamp(rast_out[..., -1:], 0, 1)[0, ...]
621
- feat_map[visible_mask == 0] = bg
622
- return feat_map
623
-
624
- def render_sketch_from_geometry(self, normal_image, depth_image):
625
- normal_image_np = normal_image.cpu().numpy()
626
- depth_image_np = depth_image.cpu().numpy()
627
-
628
- normal_image_np = (normal_image_np * 255).astype(np.uint8)
629
- depth_image_np = (depth_image_np * 255).astype(np.uint8)
630
- normal_image_np = cv2.cvtColor(normal_image_np, cv2.COLOR_RGB2GRAY)
631
-
632
- normal_edges = cv2.Canny(normal_image_np, 80, 150)
633
- depth_edges = cv2.Canny(depth_image_np, 30, 80)
634
-
635
- combined_edges = np.maximum(normal_edges, depth_edges)
636
-
637
- sketch_image = torch.from_numpy(combined_edges).to(
638
- normal_image.device).float() / 255.0
639
- sketch_image = sketch_image.unsqueeze(-1)
640
-
641
- return sketch_image
642
-
643
- def render_sketch_from_depth(self, depth_image):
644
- depth_image_np = depth_image.cpu().numpy()
645
- depth_image_np = (depth_image_np * 255).astype(np.uint8)
646
- depth_edges = cv2.Canny(depth_image_np, 30, 80)
647
- combined_edges = depth_edges
648
- sketch_image = torch.from_numpy(combined_edges).to(
649
- depth_image.device).float() / 255.0
650
- sketch_image = sketch_image.unsqueeze(-1)
651
- return sketch_image
652
-
653
- def back_project(self, image, elev, azim,
654
- camera_distance=None, center=None, method=None):
655
- if isinstance(image, Image.Image):
656
- image = torch.tensor(np.array(image) / 255.0)
657
- elif isinstance(image, np.ndarray):
658
- image = torch.tensor(image)
659
- if image.dim() == 2:
660
- image = image.unsqueeze(-1)
661
- image = image.float().to(self.device)
662
- resolution = image.shape[:2]
663
- channel = image.shape[-1]
664
- texture = torch.zeros(self.texture_size + (channel,)).to(self.device)
665
- cos_map = torch.zeros(self.texture_size + (1,)).to(self.device)
666
-
667
- proj = self.camera_proj_mat
668
- r_mv = get_mv_matrix(
669
- elev=elev,
670
- azim=azim,
671
- camera_distance=self.camera_distance if camera_distance is None else camera_distance,
672
- center=center)
673
- pos_camera = transform_pos(r_mv, self.vtx_pos, keepdim=True)
674
- pos_clip = transform_pos(proj, pos_camera)
675
- pos_camera = pos_camera[:, :3] / pos_camera[:, 3:4]
676
- v0 = pos_camera[self.pos_idx[:, 0], :]
677
- v1 = pos_camera[self.pos_idx[:, 1], :]
678
- v2 = pos_camera[self.pos_idx[:, 2], :]
679
- face_normals = F.normalize(
680
- torch.cross(
681
- v1 - v0,
682
- v2 - v0,
683
- dim=-1),
684
- dim=-1)
685
- vertex_normals = trimesh.geometry.mean_vertex_normals(vertex_count=self.vtx_pos.shape[0],
686
- faces=self.pos_idx.cpu(),
687
- face_normals=face_normals.cpu(), )
688
- vertex_normals = torch.from_numpy(
689
- vertex_normals).float().to(self.device).contiguous()
690
- tex_depth = pos_camera[:, 2].reshape(1, -1, 1).contiguous()
691
- rast_out, rast_out_db = self.raster_rasterize(
692
- pos_clip, self.pos_idx, resolution=resolution)
693
- visible_mask = torch.clamp(rast_out[..., -1:], 0, 1)[0, ...]
694
-
695
- normal, _ = self.raster_interpolate(
696
- vertex_normals[None, ...], rast_out, self.pos_idx)
697
- normal = normal[0, ...]
698
- uv, _ = self.raster_interpolate(self.vtx_uv[None, ...], rast_out, self.uv_idx)
699
- depth, _ = self.raster_interpolate(tex_depth, rast_out, self.pos_idx)
700
- depth = depth[0, ...]
701
-
702
- depth_max, depth_min = depth[visible_mask >
703
- 0].max(), depth[visible_mask > 0].min()
704
- depth_normalized = (depth - depth_min) / (depth_max - depth_min)
705
- depth_image = depth_normalized * visible_mask # Mask out background.
706
-
707
- sketch_image = self.render_sketch_from_depth(depth_image)
708
-
709
- lookat = torch.tensor([[0, 0, -1]], device=self.device)
710
- cos_image = torch.nn.functional.cosine_similarity(
711
- lookat, normal.view(-1, 3))
712
- cos_image = cos_image.view(normal.shape[0], normal.shape[1], 1)
713
-
714
- cos_thres = np.cos(self.bake_angle_thres / 180 * np.pi)
715
- cos_image[cos_image < cos_thres] = 0
716
-
717
- # shrink
718
- kernel_size = self.bake_unreliable_kernel_size * 2 + 1
719
- kernel = torch.ones(
720
- (1, 1, kernel_size, kernel_size), dtype=torch.float32).to(
721
- sketch_image.device)
722
-
723
- visible_mask = visible_mask.permute(2, 0, 1).unsqueeze(0).float()
724
- visible_mask = F.conv2d(
725
- 1.0 - visible_mask,
726
- kernel,
727
- padding=kernel_size // 2)
728
- visible_mask = 1.0 - (visible_mask > 0).float() # 二值化
729
- visible_mask = visible_mask.squeeze(0).permute(1, 2, 0)
730
-
731
- sketch_image = sketch_image.permute(2, 0, 1).unsqueeze(0)
732
- sketch_image = F.conv2d(sketch_image, kernel, padding=kernel_size // 2)
733
- sketch_image = (sketch_image > 0).float() # 二值化
734
- sketch_image = sketch_image.squeeze(0).permute(1, 2, 0)
735
- visible_mask = visible_mask * (sketch_image < 0.5)
736
-
737
- cos_image[visible_mask == 0] = 0
738
-
739
- method = self.bake_mode if method is None else method
740
-
741
- if method == 'linear':
742
- proj_mask = (visible_mask != 0).view(-1)
743
- uv = uv.squeeze(0).contiguous().view(-1, 2)[proj_mask]
744
- image = image.squeeze(0).contiguous().view(-1, channel)[proj_mask]
745
- cos_image = cos_image.contiguous().view(-1, 1)[proj_mask]
746
- sketch_image = sketch_image.contiguous().view(-1, 1)[proj_mask]
747
-
748
- texture = linear_grid_put_2d(
749
- self.texture_size[1], self.texture_size[0], uv[..., [1, 0]], image)
750
- cos_map = linear_grid_put_2d(
751
- self.texture_size[1], self.texture_size[0], uv[..., [1, 0]], cos_image)
752
- boundary_map = linear_grid_put_2d(
753
- self.texture_size[1], self.texture_size[0], uv[..., [1, 0]], sketch_image)
754
- else:
755
- raise f'No bake mode {method}'
756
-
757
- return texture, cos_map, boundary_map
758
-
759
- def bake_texture(self, colors, elevs, azims,
760
- camera_distance=None, center=None, exp=6, weights=None):
761
- for i in range(len(colors)):
762
- if isinstance(colors[i], Image.Image):
763
- colors[i] = torch.tensor(
764
- np.array(
765
- colors[i]) / 255.0,
766
- device=self.device).float()
767
- if weights is None:
768
- weights = [1.0 for _ in range(colors)]
769
- textures = []
770
- cos_maps = []
771
- for color, elev, azim, weight in zip(colors, elevs, azims, weights):
772
- texture, cos_map, _ = self.back_project(
773
- color, elev, azim, camera_distance, center)
774
- cos_map = weight * (cos_map ** exp)
775
- textures.append(texture)
776
- cos_maps.append(cos_map)
777
-
778
- texture_merge, trust_map_merge = self.fast_bake_texture(
779
- textures, cos_maps)
780
- return texture_merge, trust_map_merge
781
-
782
- @torch.no_grad()
783
- def fast_bake_texture(self, textures, cos_maps):
784
-
785
- channel = textures[0].shape[-1]
786
- texture_merge = torch.zeros(
787
- self.texture_size + (channel,)).to(self.device)
788
- trust_map_merge = torch.zeros(self.texture_size + (1,)).to(self.device)
789
- for texture, cos_map in zip(textures, cos_maps):
790
- view_sum = (cos_map > 0).sum()
791
- painted_sum = ((cos_map > 0) * (trust_map_merge > 0)).sum()
792
- if painted_sum / view_sum > 0.99:
793
- continue
794
- texture_merge += texture * cos_map
795
- trust_map_merge += cos_map
796
- texture_merge = texture_merge / torch.clamp(trust_map_merge, min=1E-8)
797
-
798
- return texture_merge, trust_map_merge > 1E-8
799
-
800
- def uv_inpaint(self, texture, mask):
801
-
802
- if isinstance(texture, torch.Tensor):
803
- texture_np = texture.cpu().numpy()
804
- elif isinstance(texture, np.ndarray):
805
- texture_np = texture
806
- elif isinstance(texture, Image.Image):
807
- texture_np = np.array(texture) / 255.0
808
-
809
- vtx_pos, pos_idx, vtx_uv, uv_idx = self.get_mesh()
810
-
811
- texture_np, mask = meshVerticeInpaint(
812
- texture_np, mask, vtx_pos, vtx_uv, pos_idx, uv_idx)
813
-
814
- texture_np = cv2.inpaint(
815
- (texture_np *
816
- 255).astype(
817
- np.uint8),
818
- 255 -
819
- mask,
820
- 3,
821
- cv2.INPAINT_NS)
822
-
823
- return texture_np
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mesh_utils.py DELETED
@@ -1,34 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import trimesh
16
-
17
-
18
- def load_mesh(mesh):
19
- vtx_pos = mesh.vertices if hasattr(mesh, 'vertices') else None
20
- pos_idx = mesh.faces if hasattr(mesh, 'faces') else None
21
-
22
- vtx_uv = mesh.visual.uv if hasattr(mesh.visual, 'uv') else None
23
- uv_idx = mesh.faces if hasattr(mesh, 'faces') else None
24
-
25
- texture_data = None
26
-
27
- return vtx_pos, pos_idx, vtx_uv, uv_idx, texture_data
28
-
29
-
30
- def save_mesh(mesh, texture_data):
31
- material = trimesh.visual.texture.SimpleMaterial(image=texture_data, diffuse=(255, 255, 255))
32
- texture_visuals = trimesh.visual.TextureVisuals(uv=mesh.visual.uv, image=texture_data, material=material)
33
- mesh.visual = texture_visuals
34
- return mesh
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
minimal_demo.py DELETED
@@ -1,79 +0,0 @@
1
- # Open Source Model Licensed under the Apache License Version 2.0
2
- # and Other Licenses of the Third-Party Components therein:
3
- # The below Model in this distribution may have been modified by THL A29 Limited
4
- # ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2024 THL A29 Limited.
5
-
6
- # Copyright (C) 2024 THL A29 Limited, a Tencent company. All rights reserved.
7
- # The below software and/or models in this distribution may have been
8
- # modified by THL A29 Limited ("Tencent Modifications").
9
- # All Tencent Modifications are Copyright (C) THL A29 Limited.
10
-
11
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
12
- # except for the third-party components listed below.
13
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
14
- # in the repsective licenses of these third-party components.
15
- # Users must comply with all terms and conditions of original licenses of these third-party
16
- # components and must ensure that the usage of the third party components adheres to
17
- # all relevant laws and regulations.
18
-
19
- # For avoidance of doubts, Hunyuan 3D means the large language models and
20
- # their software and algorithms, including trained model weights, parameters (including
21
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
22
- # fine-tuning enabling code and other elements of the foregoing made publicly available
23
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
24
-
25
- import torch
26
- from PIL imprt Image
27
-
28
- from hy3dgen.rembg import BackgroundRemover
29
- from hy3dgen.shapegen import Hunyuan3DDiTFlowMatchingPipeline, FaceReducer, FloaterRemover, DegenerateFaceRemover
30
- from hy3dgen.text2image import HunyuanDiTPipeline
31
-
32
-
33
- def image_to_3d(image_path='assets/demo.png'):
34
- rembg = BackgroundRemover()
35
- model_path = 'Hunyuan3D-2'
36
-
37
- image = Image.open(image_path)
38
- image = image.resize((1024, 1024))
39
-
40
- if image.mode == 'RGB':
41
- image = rembg(image)
42
-
43
- pipeline = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained(model_path)
44
-
45
- mesh = pipeline(image=image, num_inference_steps=30, mc_algo='mc',
46
- generator=torch.manual_seed(2025))[0]
47
- mesh = FloaterRemover()(mesh)
48
- mesh = DegenerateFaceRemover()(mesh)
49
- mesh = FaceReducer()(mesh)
50
- mesh.export('mesh.glb')
51
-
52
- try:
53
- from hy3dgen.texgen import Hunyuan3DPaintPipeline
54
- pipeline = Hunyuan3DPaintPipeline.from_pretrained(model_path)
55
- mesh = pipeline(mesh, image=image)
56
- mesh.export('texture.glb')
57
- except Exception as e:
58
- print(e)
59
- print('Please try to install requirements by following README.md')
60
-
61
-
62
- def text_to_3d(prompt='a car'):
63
- rembg = BackgroundRemover()
64
- t2i = HunyuanDiTPipeline('Tencent-Hunyuan--HunyuanDiT-v1.1-Diffusers-Distilled')
65
- model_path = 'Hunyuan3D-2'
66
- i23d = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained(model_path)
67
-
68
- image = t2i(prompt)
69
- image = rembg(image)
70
- mesh = i23d(image, num_inference_steps=30, mc_algo='mc')[0]
71
- mesh = FloaterRemover()(mesh)
72
- mesh = DegenerateFaceRemover()(mesh)
73
- mesh = FaceReducer()(mesh)
74
- mesh.export('t2i_demo.glb')
75
-
76
-
77
- if __name__ == '__main__':
78
- image_to_3d()
79
- # text_to_3d()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
model.py DELETED
@@ -1,189 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import os
16
-
17
- import torch
18
- import torch.nn as nn
19
- import yaml
20
-
21
- from .attention_blocks import FourierEmbedder, Transformer, CrossAttentionDecoder
22
- from .surface_extractors import MCSurfaceExtractor, SurfaceExtractors
23
- from .volume_decoders import VanillaVolumeDecoder, FlashVDMVolumeDecoding, HierarchicalVolumeDecoding
24
- from ...utils import logger, synchronize_timer, smart_load_model
25
-
26
-
27
- class VectsetVAE(nn.Module):
28
-
29
- @classmethod
30
- @synchronize_timer('VectsetVAE Model Loading')
31
- def from_single_file(
32
- cls,
33
- ckpt_path,
34
- config_path,
35
- device='cuda',
36
- dtype=torch.float16,
37
- use_safetensors=None,
38
- **kwargs,
39
- ):
40
- # load config
41
- with open(config_path, 'r') as f:
42
- config = yaml.safe_load(f)
43
-
44
- # load ckpt
45
- if use_safetensors:
46
- ckpt_path = ckpt_path.replace('.ckpt', '.safetensors')
47
- if not os.path.exists(ckpt_path):
48
- raise FileNotFoundError(f"Model file {ckpt_path} not found")
49
-
50
- logger.info(f"Loading model from {ckpt_path}")
51
- if use_safetensors:
52
- import safetensors.torch
53
- ckpt = safetensors.torch.load_file(ckpt_path, device='cpu')
54
- else:
55
- ckpt = torch.load(ckpt_path, map_location='cpu', weights_only=True)
56
-
57
- model_kwargs = config['params']
58
- model_kwargs.update(kwargs)
59
-
60
- model = cls(**model_kwargs)
61
- model.load_state_dict(ckpt)
62
- model.to(device=device, dtype=dtype)
63
- return model
64
-
65
- @classmethod
66
- def from_pretrained(
67
- cls,
68
- model_path,
69
- device='cuda',
70
- dtype=torch.float16,
71
- use_safetensors=True,
72
- variant='fp16',
73
- subfolder='hunyuan3d-vae-v2-0',
74
- **kwargs,
75
- ):
76
- config_path, ckpt_path = smart_load_model(
77
- model_path,
78
- subfolder=subfolder,
79
- use_safetensors=use_safetensors,
80
- variant=variant
81
- )
82
-
83
- return cls.from_single_file(
84
- ckpt_path,
85
- config_path,
86
- device=device,
87
- dtype=dtype,
88
- use_safetensors=use_safetensors,
89
- **kwargs
90
- )
91
-
92
- def __init__(
93
- self,
94
- volume_decoder=None,
95
- surface_extractor=None
96
- ):
97
- super().__init__()
98
- if volume_decoder is None:
99
- volume_decoder = VanillaVolumeDecoder()
100
- if surface_extractor is None:
101
- surface_extractor = MCSurfaceExtractor()
102
- self.volume_decoder = volume_decoder
103
- self.surface_extractor = surface_extractor
104
-
105
- def latents2mesh(self, latents: torch.FloatTensor, **kwargs):
106
- with synchronize_timer('Volume decoding'):
107
- grid_logits = self.volume_decoder(latents, self.geo_decoder, **kwargs)
108
- with synchronize_timer('Surface extraction'):
109
- outputs = self.surface_extractor(grid_logits, **kwargs)
110
- return outputs
111
-
112
- def enable_flashvdm_decoder(
113
- self,
114
- enabled: bool = True,
115
- adaptive_kv_selection=True,
116
- topk_mode='mean',
117
- mc_algo='dmc',
118
- ):
119
- if enabled:
120
- if adaptive_kv_selection:
121
- self.volume_decoder = FlashVDMVolumeDecoding(topk_mode)
122
- else:
123
- self.volume_decoder = HierarchicalVolumeDecoding()
124
- if mc_algo not in SurfaceExtractors.keys():
125
- raise ValueError(f'Unsupported mc_algo {mc_algo}, available: {list(SurfaceExtractors.keys())}')
126
- self.surface_extractor = SurfaceExtractors[mc_algo]()
127
- else:
128
- self.volume_decoder = VanillaVolumeDecoder()
129
- self.surface_extractor = MCSurfaceExtractor()
130
-
131
-
132
- class ShapeVAE(VectsetVAE):
133
- def __init__(
134
- self,
135
- *,
136
- num_latents: int,
137
- embed_dim: int,
138
- width: int,
139
- heads: int,
140
- num_decoder_layers: int,
141
- geo_decoder_downsample_ratio: int = 1,
142
- geo_decoder_mlp_expand_ratio: int = 4,
143
- geo_decoder_ln_post: bool = True,
144
- num_freqs: int = 8,
145
- include_pi: bool = True,
146
- qkv_bias: bool = True,
147
- qk_norm: bool = False,
148
- label_type: str = "binary",
149
- drop_path_rate: float = 0.0,
150
- scale_factor: float = 1.0,
151
- ):
152
- super().__init__()
153
- self.geo_decoder_ln_post = geo_decoder_ln_post
154
-
155
- self.fourier_embedder = FourierEmbedder(num_freqs=num_freqs, include_pi=include_pi)
156
-
157
- self.post_kl = nn.Linear(embed_dim, width)
158
-
159
- self.transformer = Transformer(
160
- n_ctx=num_latents,
161
- width=width,
162
- layers=num_decoder_layers,
163
- heads=heads,
164
- qkv_bias=qkv_bias,
165
- qk_norm=qk_norm,
166
- drop_path_rate=drop_path_rate
167
- )
168
-
169
- self.geo_decoder = CrossAttentionDecoder(
170
- fourier_embedder=self.fourier_embedder,
171
- out_channels=1,
172
- num_latents=num_latents,
173
- mlp_expand_ratio=geo_decoder_mlp_expand_ratio,
174
- downsample_ratio=geo_decoder_downsample_ratio,
175
- enable_ln_post=self.geo_decoder_ln_post,
176
- width=width // geo_decoder_downsample_ratio,
177
- heads=heads // geo_decoder_downsample_ratio,
178
- qkv_bias=qkv_bias,
179
- qk_norm=qk_norm,
180
- label_type=label_type,
181
- )
182
-
183
- self.scale_factor = scale_factor
184
- self.latent_shape = (num_latents, embed_dim)
185
-
186
- def forward(self, latents):
187
- latents = self.post_kl(latents)
188
- latents = self.transformer(latents)
189
- return latents
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
modelviewer-template.html DELETED
@@ -1,81 +0,0 @@
1
- <!DOCTYPE html>
2
- <html>
3
-
4
- <head>
5
- <!-- Import the component -->
6
- <script src="https://ajax.googleapis.com/ajax/libs/model-viewer/3.1.1/model-viewer.min.js" type="module"></script>
7
-
8
- <script>
9
- document.addEventListener('DOMContentLoaded', () => {
10
- const modelViewers = document.querySelectorAll('model-viewer');
11
- const isSafari = /^((?!chrome|android).)*safari/i.test(navigator.userAgent);
12
-
13
- modelViewers.forEach(modelViewer => {
14
- //modelViewer.setAttribute(
15
- // "environment-image",
16
- // "/static/env_maps/gradient.jpg"
17
- //);
18
- // if (!isSafari) {
19
- // modelViewer.setAttribute(
20
- // "environment-image",
21
- // "/static/env_maps/gradient.jpg"
22
- // );
23
- // } else {
24
- // modelViewer.addEventListener('load', (event) => {
25
- // const [material] = modelViewer.model.materials;
26
- // let color = [43, 44, 46, 255];
27
- // color = color.map(x => x / 255);
28
- // material.pbrMetallicRoughness.setMetallicFactor(0.1); // 完全金属
29
- // material.pbrMetallicRoughness.setRoughnessFactor(0.7); // 低粗糙度
30
- // material.pbrMetallicRoughness.setBaseColorFactor(color); // CornflowerBlue in RGB
31
- // });
32
- // }
33
- modelViewer.addEventListener('load', (event) => {
34
- const [material] = modelViewer.model.materials;
35
- let color = [43, 44, 46, 255];
36
- color = color.map(x => x / 255);
37
- material.pbrMetallicRoughness.setMetallicFactor(0.1); // 完全金属
38
- material.pbrMetallicRoughness.setRoughnessFactor(0.7); // 低粗糙度
39
- material.pbrMetallicRoughness.setBaseColorFactor(color); // CornflowerBlue in RGB
40
- });
41
- });
42
- });
43
- </script>
44
-
45
- <style>
46
- body {
47
- margin: 0;
48
- font-family: Arial, sans-serif;
49
- }
50
-
51
- .centered-container {
52
- display: flex;
53
- justify-content: center;
54
- align-items: center;
55
- border-radius: 8px;
56
- border-color: #e5e7eb;
57
- border-style: solid;
58
- border-width: 1px;
59
- }
60
- </style>
61
- </head>
62
-
63
- <body>
64
- <div class="centered-container">
65
- <div class="column is-mobile is-centered">
66
- <model-viewer id="modelviewer" style="height: #height#px; width: #width#px;"
67
- rotation-per-second="10deg"
68
- src="#src#" disable-tap
69
- environment-image="neutral"
70
- camera-target="0m 0m 0m"
71
- camera-orbit="0deg 90deg 8m"
72
- orientation="0deg 0deg 0deg"
73
- shadow-intensity=".9"
74
- ar auto-rotate
75
- camera-controls>
76
- </model-viewer>
77
- </div>
78
- </div>
79
- </body>
80
-
81
- </html>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
modelviewer-textured-template.html DELETED
@@ -1,136 +0,0 @@
1
- <!DOCTYPE html>
2
- <html>
3
-
4
- <head>
5
- <!-- Import the component -->
6
- <script src="https://ajax.googleapis.com/ajax/libs/model-viewer/3.1.1/model-viewer.min.js" type="module"></script>
7
-
8
- <style>
9
- body {
10
- margin: 0;
11
- font-family: Arial, sans-serif;
12
- }
13
-
14
- .centered-container {
15
- display: flex;
16
- justify-content: center;
17
- align-items: center;
18
- }
19
-
20
- .modelviewer-panel-button {
21
- height: 30px;
22
- margin: 4px 4px;
23
- padding: 0px 14px;
24
- background: white;
25
- border-radius: 10px;
26
- box-shadow: 0px 0px 4px rgba(0, 0, 0, 0.25);
27
- font-size: 14px;
28
- font-weight: 600;
29
- display: flex;
30
- align-items: center;
31
- justify-content: center;
32
- cursor: pointer;
33
- transition: all 0.2s ease;
34
- }
35
-
36
- .modelviewer-panel-button.checked {
37
- background: #6567C9;
38
- color: white;
39
- }
40
-
41
- .modelviewer-panel-button:hover {
42
- background-color: #e2e6ea;
43
- }
44
-
45
- .modelviewer-panel-button-container {
46
- display: flex;
47
- justify-content: space-around;
48
- }
49
-
50
- .centered-container {
51
- display: flex;
52
- flex-direction: column;
53
- align-items: center;
54
- }
55
-
56
- </style>
57
- </head>
58
-
59
- <body>
60
- <div class="centered-container">
61
- <div class="centered-container">
62
- <div class="column is-mobile is-centered">
63
- <model-viewer id="modelviewer" style="height: #height#px; width: #width#px;"
64
- rotation-per-second="10deg"
65
- src="#src#" disable-tap
66
- environment-image="neutral"
67
- camera-target="0m 0m 0m"
68
- camera-orbit="0deg 90deg 12m"
69
- orientation="0deg 0deg 0deg"
70
- shadow-intensity=".9"
71
- ar auto-rotate
72
- camera-controls>
73
- </model-viewer>
74
- </div>
75
-
76
- <div class="modelviewer-panel-button-container">
77
- <div id="appearance-button" class="modelviewer-panel-button small checked" onclick="showTexture()">
78
- Appearance
79
- </div>
80
- <div id="geometry-button" class="modelviewer-panel-button small" onclick="hideTexture()">Geometry</div>
81
- </div>
82
- </div>
83
- </div>
84
-
85
- <script>
86
- document.addEventListener('DOMContentLoaded', () => {
87
- const modelViewers = document.querySelectorAll('model-viewer');
88
-
89
- modelViewers.forEach(modelViewer => {
90
- modelViewer.addEventListener('load', (event) => {
91
- const [material] = modelViewer.model.materials;
92
- material.pbrMetallicRoughness.setMetallicFactor(0.1);
93
- material.pbrMetallicRoughness.setRoughnessFactor(0.5);
94
- });
95
- });
96
- });
97
-
98
- var window_state = {};
99
-
100
- function hideTexture() {
101
- let appearanceButton = document.getElementById('appearance-button');
102
- let geometryButton = document.getElementById('geometry-button');
103
- appearanceButton.classList.remove('checked');
104
- geometryButton.classList.add('checked');
105
- let modelViewer = document.getElementById('modelviewer');
106
- if (modelViewer.model.materials[0].pbrMetallicRoughness.baseColorTexture.texture === null) return;
107
- window_state.textures = [];
108
- for (let i = 0; i < modelViewer.model.materials.length; i++) {
109
- window_state.textures.push(modelViewer.model.materials[i].pbrMetallicRoughness.baseColorTexture.texture);
110
- }
111
- window_state.exposure = modelViewer.exposure;
112
- modelViewer.environmentImage = '/static/env_maps/gradient.jpg';
113
- for (let i = 0; i < modelViewer.model.materials.length; i++) {
114
- modelViewer.model.materials[i].pbrMetallicRoughness.baseColorTexture.setTexture(null);
115
- }
116
- modelViewer.exposure = 4;
117
- }
118
-
119
- function showTexture() {
120
- let appearanceButton = document.getElementById('appearance-button');
121
- let geometryButton = document.getElementById('geometry-button');
122
- appearanceButton.classList.add('checked');
123
- geometryButton.classList.remove('checked');
124
- let modelViewer = document.getElementById('modelviewer');
125
- if (modelViewer.model.materials[0].pbrMetallicRoughness.baseColorTexture.texture !== null) return;
126
- modelViewer.environmentImage = '/static/env_maps/white.jpg';
127
- for (let i = 0; i < modelViewer.model.materials.length; i++) {
128
- modelViewer.model.materials[i].pbrMetallicRoughness.baseColorTexture.setTexture(window_state.textures[i]);
129
- }
130
- modelViewer.exposure = window_state.exposure;
131
- }
132
-
133
- </script>
134
- </body>
135
-
136
- </html>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
modules.py DELETED
@@ -1,429 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import copy
16
- import json
17
- import os
18
- from typing import Any, Dict, Optional
19
-
20
- import torch
21
- import torch.nn as nn
22
- from diffusers.models import UNet2DConditionModel
23
- from diffusers.models.attention_processor import Attention
24
- from diffusers.models.transformers.transformer_2d import BasicTransformerBlock
25
- from einops import rearrange
26
-
27
-
28
- def _chunked_feed_forward(ff: nn.Module, hidden_states: torch.Tensor, chunk_dim: int, chunk_size: int):
29
- # "feed_forward_chunk_size" can be used to save memory
30
- if hidden_states.shape[chunk_dim] % chunk_size != 0:
31
- raise ValueError(
32
- f"`hidden_states` dimension to be chunked: {hidden_states.shape[chunk_dim]} has to be divisible by chunk size: {chunk_size}. Make sure to set an appropriate `chunk_size` when calling `unet.enable_forward_chunking`."
33
- )
34
-
35
- num_chunks = hidden_states.shape[chunk_dim] // chunk_size
36
- ff_output = torch.cat(
37
- [ff(hid_slice) for hid_slice in hidden_states.chunk(num_chunks, dim=chunk_dim)],
38
- dim=chunk_dim,
39
- )
40
- return ff_output
41
-
42
-
43
- class Basic2p5DTransformerBlock(torch.nn.Module):
44
- def __init__(self, transformer: BasicTransformerBlock, layer_name, use_ma=True, use_ra=True) -> None:
45
- super().__init__()
46
- self.transformer = transformer
47
- self.layer_name = layer_name
48
- self.use_ma = use_ma
49
- self.use_ra = use_ra
50
-
51
- # multiview attn
52
- if self.use_ma:
53
- self.attn_multiview = Attention(
54
- query_dim=self.dim,
55
- heads=self.num_attention_heads,
56
- dim_head=self.attention_head_dim,
57
- dropout=self.dropout,
58
- bias=self.attention_bias,
59
- cross_attention_dim=None,
60
- upcast_attention=self.attn1.upcast_attention,
61
- out_bias=True,
62
- )
63
-
64
- # ref attn
65
- if self.use_ra:
66
- self.attn_refview = Attention(
67
- query_dim=self.dim,
68
- heads=self.num_attention_heads,
69
- dim_head=self.attention_head_dim,
70
- dropout=self.dropout,
71
- bias=self.attention_bias,
72
- cross_attention_dim=None,
73
- upcast_attention=self.attn1.upcast_attention,
74
- out_bias=True,
75
- )
76
-
77
- def __getattr__(self, name: str):
78
- try:
79
- return super().__getattr__(name)
80
- except AttributeError:
81
- return getattr(self.transformer, name)
82
-
83
- def forward(
84
- self,
85
- hidden_states: torch.Tensor,
86
- attention_mask: Optional[torch.Tensor] = None,
87
- encoder_hidden_states: Optional[torch.Tensor] = None,
88
- encoder_attention_mask: Optional[torch.Tensor] = None,
89
- timestep: Optional[torch.LongTensor] = None,
90
- cross_attention_kwargs: Dict[str, Any] = None,
91
- class_labels: Optional[torch.LongTensor] = None,
92
- added_cond_kwargs: Optional[Dict[str, torch.Tensor]] = None,
93
- ) -> torch.Tensor:
94
-
95
- # Notice that normalization is always applied before the real computation in the following blocks.
96
- # 0. Self-Attention
97
- batch_size = hidden_states.shape[0]
98
-
99
- cross_attention_kwargs = cross_attention_kwargs.copy() if cross_attention_kwargs is not None else {}
100
- num_in_batch = cross_attention_kwargs.pop('num_in_batch', 1)
101
- mode = cross_attention_kwargs.pop('mode', None)
102
- mva_scale = cross_attention_kwargs.pop('mva_scale', 1.0)
103
- ref_scale = cross_attention_kwargs.pop('ref_scale', 1.0)
104
- condition_embed_dict = cross_attention_kwargs.pop("condition_embed_dict", None)
105
-
106
- if self.norm_type == "ada_norm":
107
- norm_hidden_states = self.norm1(hidden_states, timestep)
108
- elif self.norm_type == "ada_norm_zero":
109
- norm_hidden_states, gate_msa, shift_mlp, scale_mlp, gate_mlp = self.norm1(
110
- hidden_states, timestep, class_labels, hidden_dtype=hidden_states.dtype
111
- )
112
- elif self.norm_type in ["layer_norm", "layer_norm_i2vgen"]:
113
- norm_hidden_states = self.norm1(hidden_states)
114
- elif self.norm_type == "ada_norm_continuous":
115
- norm_hidden_states = self.norm1(hidden_states, added_cond_kwargs["pooled_text_emb"])
116
- elif self.norm_type == "ada_norm_single":
117
- shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (
118
- self.scale_shift_table[None] + timestep.reshape(batch_size, 6, -1)
119
- ).chunk(6, dim=1)
120
- norm_hidden_states = self.norm1(hidden_states)
121
- norm_hidden_states = norm_hidden_states * (1 + scale_msa) + shift_msa
122
- else:
123
- raise ValueError("Incorrect norm used")
124
-
125
- if self.pos_embed is not None:
126
- norm_hidden_states = self.pos_embed(norm_hidden_states)
127
-
128
- # 1. Prepare GLIGEN inputs
129
- cross_attention_kwargs = cross_attention_kwargs.copy() if cross_attention_kwargs is not None else {}
130
- gligen_kwargs = cross_attention_kwargs.pop("gligen", None)
131
-
132
- attn_output = self.attn1(
133
- norm_hidden_states,
134
- encoder_hidden_states=encoder_hidden_states if self.only_cross_attention else None,
135
- attention_mask=attention_mask,
136
- **cross_attention_kwargs,
137
- )
138
-
139
- if self.norm_type == "ada_norm_zero":
140
- attn_output = gate_msa.unsqueeze(1) * attn_output
141
- elif self.norm_type == "ada_norm_single":
142
- attn_output = gate_msa * attn_output
143
-
144
- hidden_states = attn_output + hidden_states
145
- if hidden_states.ndim == 4:
146
- hidden_states = hidden_states.squeeze(1)
147
-
148
- # 1.2 Reference Attention
149
- if 'w' in mode:
150
- condition_embed_dict[self.layer_name] = rearrange(norm_hidden_states, '(b n) l c -> b (n l) c',
151
- n=num_in_batch) # B, (N L), C
152
-
153
- if 'r' in mode and self.use_ra:
154
- condition_embed = condition_embed_dict[self.layer_name].unsqueeze(1).repeat(1, num_in_batch, 1,
155
- 1) # B N L C
156
- condition_embed = rearrange(condition_embed, 'b n l c -> (b n) l c')
157
-
158
- attn_output = self.attn_refview(
159
- norm_hidden_states,
160
- encoder_hidden_states=condition_embed,
161
- attention_mask=None,
162
- **cross_attention_kwargs
163
- )
164
- ref_scale_timing = ref_scale
165
- if isinstance(ref_scale, torch.Tensor):
166
- ref_scale_timing = ref_scale.unsqueeze(1).repeat(1, num_in_batch).view(-1)
167
- for _ in range(attn_output.ndim - 1):
168
- ref_scale_timing = ref_scale_timing.unsqueeze(-1)
169
- hidden_states = ref_scale_timing * attn_output + hidden_states
170
- if hidden_states.ndim == 4:
171
- hidden_states = hidden_states.squeeze(1)
172
-
173
- # 1.3 Multiview Attention
174
- if num_in_batch > 1 and self.use_ma:
175
- multivew_hidden_states = rearrange(norm_hidden_states, '(b n) l c -> b (n l) c', n=num_in_batch)
176
-
177
- attn_output = self.attn_multiview(
178
- multivew_hidden_states,
179
- encoder_hidden_states=multivew_hidden_states,
180
- **cross_attention_kwargs
181
- )
182
-
183
- attn_output = rearrange(attn_output, 'b (n l) c -> (b n) l c', n=num_in_batch)
184
-
185
- hidden_states = mva_scale * attn_output + hidden_states
186
- if hidden_states.ndim == 4:
187
- hidden_states = hidden_states.squeeze(1)
188
-
189
- # 1.2 GLIGEN Control
190
- if gligen_kwargs is not None:
191
- hidden_states = self.fuser(hidden_states, gligen_kwargs["objs"])
192
-
193
- # 3. Cross-Attention
194
- if self.attn2 is not None:
195
- if self.norm_type == "ada_norm":
196
- norm_hidden_states = self.norm2(hidden_states, timestep)
197
- elif self.norm_type in ["ada_norm_zero", "layer_norm", "layer_norm_i2vgen"]:
198
- norm_hidden_states = self.norm2(hidden_states)
199
- elif self.norm_type == "ada_norm_single":
200
- # For PixArt norm2 isn't applied here:
201
- # https://github.com/PixArt-alpha/PixArt-alpha/blob/0f55e922376d8b797edd44d25d0e7464b260dcab/diffusion/model/nets/PixArtMS.py#L70C1-L76C103
202
- norm_hidden_states = hidden_states
203
- elif self.norm_type == "ada_norm_continuous":
204
- norm_hidden_states = self.norm2(hidden_states, added_cond_kwargs["pooled_text_emb"])
205
- else:
206
- raise ValueError("Incorrect norm")
207
-
208
- if self.pos_embed is not None and self.norm_type != "ada_norm_single":
209
- norm_hidden_states = self.pos_embed(norm_hidden_states)
210
-
211
- attn_output = self.attn2(
212
- norm_hidden_states,
213
- encoder_hidden_states=encoder_hidden_states,
214
- attention_mask=encoder_attention_mask,
215
- **cross_attention_kwargs,
216
- )
217
-
218
- hidden_states = attn_output + hidden_states
219
-
220
- # 4. Feed-forward
221
- # i2vgen doesn't have this norm 🤷‍♂️
222
- if self.norm_type == "ada_norm_continuous":
223
- norm_hidden_states = self.norm3(hidden_states, added_cond_kwargs["pooled_text_emb"])
224
- elif not self.norm_type == "ada_norm_single":
225
- norm_hidden_states = self.norm3(hidden_states)
226
-
227
- if self.norm_type == "ada_norm_zero":
228
- norm_hidden_states = norm_hidden_states * (1 + scale_mlp[:, None]) + shift_mlp[:, None]
229
-
230
- if self.norm_type == "ada_norm_single":
231
- norm_hidden_states = self.norm2(hidden_states)
232
- norm_hidden_states = norm_hidden_states * (1 + scale_mlp) + shift_mlp
233
-
234
- if self._chunk_size is not None:
235
- # "feed_forward_chunk_size" can be used to save memory
236
- ff_output = _chunked_feed_forward(self.ff, norm_hidden_states, self._chunk_dim, self._chunk_size)
237
- else:
238
- ff_output = self.ff(norm_hidden_states)
239
-
240
- if self.norm_type == "ada_norm_zero":
241
- ff_output = gate_mlp.unsqueeze(1) * ff_output
242
- elif self.norm_type == "ada_norm_single":
243
- ff_output = gate_mlp * ff_output
244
-
245
- hidden_states = ff_output + hidden_states
246
- if hidden_states.ndim == 4:
247
- hidden_states = hidden_states.squeeze(1)
248
-
249
- return hidden_states
250
-
251
-
252
- class UNet2p5DConditionModel(torch.nn.Module):
253
- def __init__(self, unet: UNet2DConditionModel) -> None:
254
- super().__init__()
255
- self.unet = unet
256
-
257
- self.use_ma = True
258
- self.use_ra = True
259
- self.use_camera_embedding = True
260
- self.use_dual_stream = True
261
-
262
- if self.use_dual_stream:
263
- self.unet_dual = copy.deepcopy(unet)
264
- self.init_attention(self.unet_dual)
265
- self.init_attention(self.unet, use_ma=self.use_ma, use_ra=self.use_ra)
266
- self.init_condition()
267
- self.init_camera_embedding()
268
-
269
- @staticmethod
270
- def from_pretrained(pretrained_model_name_or_path, **kwargs):
271
- torch_dtype = kwargs.pop('torch_dtype', torch.float32)
272
- config_path = os.path.join(pretrained_model_name_or_path, 'config.json')
273
- unet_ckpt_path = os.path.join(pretrained_model_name_or_path, 'diffusion_pytorch_model.bin')
274
- with open(config_path, 'r', encoding='utf-8') as file:
275
- config = json.load(file)
276
- unet = UNet2DConditionModel(**config)
277
- unet = UNet2p5DConditionModel(unet)
278
- unet_ckpt = torch.load(unet_ckpt_path, map_location='cpu', weights_only=True)
279
- unet.load_state_dict(unet_ckpt, strict=True)
280
- unet = unet.to(torch_dtype)
281
- return unet
282
-
283
- def init_condition(self):
284
- self.unet.conv_in = torch.nn.Conv2d(
285
- 12,
286
- self.unet.conv_in.out_channels,
287
- kernel_size=self.unet.conv_in.kernel_size,
288
- stride=self.unet.conv_in.stride,
289
- padding=self.unet.conv_in.padding,
290
- dilation=self.unet.conv_in.dilation,
291
- groups=self.unet.conv_in.groups,
292
- bias=self.unet.conv_in.bias is not None)
293
-
294
- self.unet.learned_text_clip_gen = nn.Parameter(torch.randn(1, 77, 1024))
295
- self.unet.learned_text_clip_ref = nn.Parameter(torch.randn(1, 77, 1024))
296
-
297
- def init_camera_embedding(self):
298
-
299
- if self.use_camera_embedding:
300
- time_embed_dim = 1280
301
- self.max_num_ref_image = 5
302
- self.max_num_gen_image = 12 * 3 + 4 * 2
303
- self.unet.class_embedding = nn.Embedding(self.max_num_ref_image + self.max_num_gen_image, time_embed_dim)
304
-
305
- def init_attention(self, unet, use_ma=False, use_ra=False):
306
-
307
- for down_block_i, down_block in enumerate(unet.down_blocks):
308
- if hasattr(down_block, "has_cross_attention") and down_block.has_cross_attention:
309
- for attn_i, attn in enumerate(down_block.attentions):
310
- for transformer_i, transformer in enumerate(attn.transformer_blocks):
311
- if isinstance(transformer, BasicTransformerBlock):
312
- attn.transformer_blocks[transformer_i] = Basic2p5DTransformerBlock(transformer,
313
- f'down_{down_block_i}_{attn_i}_{transformer_i}',
314
- use_ma, use_ra)
315
-
316
- if hasattr(unet.mid_block, "has_cross_attention") and unet.mid_block.has_cross_attention:
317
- for attn_i, attn in enumerate(unet.mid_block.attentions):
318
- for transformer_i, transformer in enumerate(attn.transformer_blocks):
319
- if isinstance(transformer, BasicTransformerBlock):
320
- attn.transformer_blocks[transformer_i] = Basic2p5DTransformerBlock(transformer,
321
- f'mid_{attn_i}_{transformer_i}',
322
- use_ma, use_ra)
323
-
324
- for up_block_i, up_block in enumerate(unet.up_blocks):
325
- if hasattr(up_block, "has_cross_attention") and up_block.has_cross_attention:
326
- for attn_i, attn in enumerate(up_block.attentions):
327
- for transformer_i, transformer in enumerate(attn.transformer_blocks):
328
- if isinstance(transformer, BasicTransformerBlock):
329
- attn.transformer_blocks[transformer_i] = Basic2p5DTransformerBlock(transformer,
330
- f'up_{up_block_i}_{attn_i}_{transformer_i}',
331
- use_ma, use_ra)
332
-
333
- def __getattr__(self, name: str):
334
- try:
335
- return super().__getattr__(name)
336
- except AttributeError:
337
- return getattr(self.unet, name)
338
-
339
- def forward(
340
- self, sample, timestep, encoder_hidden_states,
341
- *args, down_intrablock_additional_residuals=None,
342
- down_block_res_samples=None, mid_block_res_sample=None,
343
- **cached_condition,
344
- ):
345
- B, N_gen, _, H, W = sample.shape
346
- assert H == W
347
-
348
- if self.use_camera_embedding:
349
- camera_info_gen = cached_condition['camera_info_gen'] + self.max_num_ref_image
350
- camera_info_gen = rearrange(camera_info_gen, 'b n -> (b n)')
351
- else:
352
- camera_info_gen = None
353
-
354
- sample = [sample]
355
- if 'normal_imgs' in cached_condition:
356
- sample.append(cached_condition["normal_imgs"])
357
- if 'position_imgs' in cached_condition:
358
- sample.append(cached_condition["position_imgs"])
359
- sample = torch.cat(sample, dim=2)
360
-
361
- sample = rearrange(sample, 'b n c h w -> (b n) c h w')
362
-
363
- encoder_hidden_states_gen = encoder_hidden_states.unsqueeze(1).repeat(1, N_gen, 1, 1)
364
- encoder_hidden_states_gen = rearrange(encoder_hidden_states_gen, 'b n l c -> (b n) l c')
365
-
366
- if self.use_ra:
367
- if 'condition_embed_dict' in cached_condition:
368
- condition_embed_dict = cached_condition['condition_embed_dict']
369
- else:
370
- condition_embed_dict = {}
371
- ref_latents = cached_condition['ref_latents']
372
- N_ref = ref_latents.shape[1]
373
- if self.use_camera_embedding:
374
- camera_info_ref = cached_condition['camera_info_ref']
375
- camera_info_ref = rearrange(camera_info_ref, 'b n -> (b n)')
376
- else:
377
- camera_info_ref = None
378
-
379
- ref_latents = rearrange(ref_latents, 'b n c h w -> (b n) c h w')
380
-
381
- encoder_hidden_states_ref = self.unet.learned_text_clip_ref.unsqueeze(1).repeat(B, N_ref, 1, 1)
382
- encoder_hidden_states_ref = rearrange(encoder_hidden_states_ref, 'b n l c -> (b n) l c')
383
-
384
- noisy_ref_latents = ref_latents
385
- timestep_ref = 0
386
-
387
- if self.use_dual_stream:
388
- unet_ref = self.unet_dual
389
- else:
390
- unet_ref = self.unet
391
- unet_ref(
392
- noisy_ref_latents, timestep_ref,
393
- encoder_hidden_states=encoder_hidden_states_ref,
394
- class_labels=camera_info_ref,
395
- # **kwargs
396
- return_dict=False,
397
- cross_attention_kwargs={
398
- 'mode': 'w', 'num_in_batch': N_ref,
399
- 'condition_embed_dict': condition_embed_dict},
400
- )
401
- cached_condition['condition_embed_dict'] = condition_embed_dict
402
- else:
403
- condition_embed_dict = None
404
-
405
- mva_scale = cached_condition.get('mva_scale', 1.0)
406
- ref_scale = cached_condition.get('ref_scale', 1.0)
407
-
408
- return self.unet(
409
- sample, timestep,
410
- encoder_hidden_states_gen, *args,
411
- class_labels=camera_info_gen,
412
- down_intrablock_additional_residuals=[
413
- sample.to(dtype=self.unet.dtype) for sample in down_intrablock_additional_residuals
414
- ] if down_intrablock_additional_residuals is not None else None,
415
- down_block_additional_residuals=[
416
- sample.to(dtype=self.unet.dtype) for sample in down_block_res_samples
417
- ] if down_block_res_samples is not None else None,
418
- mid_block_additional_residual=(
419
- mid_block_res_sample.to(dtype=self.unet.dtype)
420
- if mid_block_res_sample is not None else None
421
- ),
422
- return_dict=False,
423
- cross_attention_kwargs={
424
- 'mode': 'r', 'num_in_batch': N_gen,
425
- 'condition_embed_dict': condition_embed_dict,
426
- 'mva_scale': mva_scale,
427
- 'ref_scale': ref_scale,
428
- },
429
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
multiview_utils.py DELETED
@@ -1,76 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import os
16
- import random
17
-
18
- import numpy as np
19
- import torch
20
- from diffusers import DiffusionPipeline
21
- from diffusers import EulerAncestralDiscreteScheduler
22
-
23
-
24
- class Multiview_Diffusion_Net():
25
- def __init__(self, config) -> None:
26
- self.device = config.device
27
- self.view_size = 512
28
- multiview_ckpt_path = config.multiview_ckpt_path
29
-
30
- current_file_path = os.path.abspath(__file__)
31
- custom_pipeline_path = os.path.join(os.path.dirname(current_file_path), '..', 'hunyuanpaint')
32
-
33
- pipeline = DiffusionPipeline.from_pretrained(
34
- multiview_ckpt_path,
35
- custom_pipeline=custom_pipeline_path, torch_dtype=torch.float16)
36
-
37
- pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config,
38
- timestep_spacing='trailing')
39
-
40
- pipeline.set_progress_bar_config(disable=True)
41
- self.pipeline = pipeline.to(self.device)
42
-
43
- def seed_everything(self, seed):
44
- random.seed(seed)
45
- np.random.seed(seed)
46
- torch.manual_seed(seed)
47
- os.environ["PL_GLOBAL_SEED"] = str(seed)
48
-
49
- def __call__(self, input_image, control_images, camera_info):
50
-
51
- self.seed_everything(0)
52
-
53
- input_image = input_image.resize((self.view_size, self.view_size))
54
- for i in range(len(control_images)):
55
- control_images[i] = control_images[i].resize((self.view_size, self.view_size))
56
- if control_images[i].mode == 'L':
57
- control_images[i] = control_images[i].point(lambda x: 255 if x > 1 else 0, mode='1')
58
-
59
- kwargs = dict(generator=torch.Generator(device=self.pipeline.device).manual_seed(0))
60
-
61
- num_view = len(control_images) // 2
62
- normal_image = [[control_images[i] for i in range(num_view)]]
63
- position_image = [[control_images[i + num_view] for i in range(num_view)]]
64
-
65
- camera_info_gen = [camera_info]
66
- camera_info_ref = [[0]]
67
- kwargs['width'] = self.view_size
68
- kwargs['height'] = self.view_size
69
- kwargs['num_in_batch'] = num_view
70
- kwargs['camera_info_gen'] = camera_info_gen
71
- kwargs['camera_info_ref'] = camera_info_ref
72
- kwargs["normal_imgs"] = normal_image
73
- kwargs["position_imgs"] = position_image
74
-
75
- mvd_image = self.pipeline(input_image, num_inference_steps=30, **kwargs).images
76
- return mvd_image
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
pipeline.py DELETED
@@ -1,546 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- from typing import Any, Callable, Dict, List, Optional, Union
16
-
17
- import numpy
18
- import numpy as np
19
- import torch
20
- import torch.distributed
21
- import torch.utils.checkpoint
22
- from PIL import Image
23
- from diffusers import (
24
- AutoencoderKL,
25
- DiffusionPipeline,
26
- ImagePipelineOutput
27
- )
28
- from diffusers.callbacks import MultiPipelineCallbacks, PipelineCallback
29
- from diffusers.image_processor import PipelineImageInput
30
- from diffusers.image_processor import VaeImageProcessor
31
- from diffusers.pipelines.stable_diffusion.pipeline_output import StableDiffusionPipelineOutput
32
- from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion import StableDiffusionPipeline, retrieve_timesteps, \
33
- rescale_noise_cfg
34
- from diffusers.schedulers import KarrasDiffusionSchedulers
35
- from diffusers.utils import deprecate
36
- from einops import rearrange
37
- from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer
38
-
39
- from .unet.modules import UNet2p5DConditionModel
40
-
41
-
42
- def to_rgb_image(maybe_rgba: Image.Image):
43
- if maybe_rgba.mode == 'RGB':
44
- return maybe_rgba
45
- elif maybe_rgba.mode == 'RGBA':
46
- rgba = maybe_rgba
47
- img = numpy.random.randint(127, 128, size=[rgba.size[1], rgba.size[0], 3], dtype=numpy.uint8)
48
- img = Image.fromarray(img, 'RGB')
49
- img.paste(rgba, mask=rgba.getchannel('A'))
50
- return img
51
- else:
52
- raise ValueError("Unsupported image type.", maybe_rgba.mode)
53
-
54
-
55
- class HunyuanPaintPipeline(StableDiffusionPipeline):
56
-
57
- def __init__(
58
- self,
59
- vae: AutoencoderKL,
60
- text_encoder: CLIPTextModel,
61
- tokenizer: CLIPTokenizer,
62
- unet: UNet2p5DConditionModel,
63
- scheduler: KarrasDiffusionSchedulers,
64
- feature_extractor: CLIPImageProcessor,
65
- safety_checker=None,
66
- use_torch_compile=False,
67
- ):
68
- DiffusionPipeline.__init__(self)
69
-
70
- safety_checker = None
71
- self.register_modules(
72
- vae=torch.compile(vae) if use_torch_compile else vae,
73
- text_encoder=text_encoder,
74
- tokenizer=tokenizer,
75
- unet=unet,
76
- scheduler=scheduler,
77
- safety_checker=safety_checker,
78
- feature_extractor=torch.compile(feature_extractor) if use_torch_compile else feature_extractor,
79
- )
80
- self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1)
81
- self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor)
82
-
83
- @torch.no_grad()
84
- def encode_images(self, images):
85
- B = images.shape[0]
86
- images = rearrange(images, 'b n c h w -> (b n) c h w')
87
-
88
- dtype = next(self.vae.parameters()).dtype
89
- images = (images - 0.5) * 2.0
90
- posterior = self.vae.encode(images.to(dtype)).latent_dist
91
- latents = posterior.sample() * self.vae.config.scaling_factor
92
-
93
- latents = rearrange(latents, '(b n) c h w -> b n c h w', b=B)
94
- return latents
95
-
96
- @torch.no_grad()
97
- def __call__(
98
- self,
99
- image: Image.Image = None,
100
- prompt=None,
101
- negative_prompt='watermark, ugly, deformed, noisy, blurry, low contrast',
102
- *args,
103
- num_images_per_prompt: Optional[int] = 1,
104
- guidance_scale=2.0,
105
- output_type: Optional[str] = "pil",
106
- width=512,
107
- height=512,
108
- num_inference_steps=28,
109
- return_dict=True,
110
- **cached_condition,
111
- ):
112
- device = self._execution_device
113
-
114
- if image is None:
115
- raise ValueError("Inputting embeddings not supported for this pipeline. Please pass an image.")
116
- assert not isinstance(image, torch.Tensor)
117
-
118
- image = to_rgb_image(image)
119
-
120
- image_vae = torch.tensor(np.array(image) / 255.0)
121
- image_vae = image_vae.unsqueeze(0).permute(0, 3, 1, 2).unsqueeze(0)
122
- image_vae = image_vae.to(device=device, dtype=self.vae.dtype)
123
-
124
- batch_size = image_vae.shape[0]
125
- assert batch_size == 1
126
- assert num_images_per_prompt == 1
127
-
128
- ref_latents = self.encode_images(image_vae)
129
-
130
- def convert_pil_list_to_tensor(images):
131
- bg_c = [1., 1., 1.]
132
- images_tensor = []
133
- for batch_imgs in images:
134
- view_imgs = []
135
- for pil_img in batch_imgs:
136
- img = numpy.asarray(pil_img, dtype=numpy.float32) / 255.
137
- if img.shape[2] > 3:
138
- alpha = img[:, :, 3:]
139
- img = img[:, :, :3] * alpha + bg_c * (1 - alpha)
140
- img = torch.from_numpy(img).permute(2, 0, 1).unsqueeze(0).contiguous().half().to("cuda")
141
- view_imgs.append(img)
142
- view_imgs = torch.cat(view_imgs, dim=0)
143
- images_tensor.append(view_imgs.unsqueeze(0))
144
-
145
- images_tensor = torch.cat(images_tensor, dim=0)
146
- return images_tensor
147
-
148
- if "normal_imgs" in cached_condition:
149
-
150
- if isinstance(cached_condition["normal_imgs"], List):
151
- cached_condition["normal_imgs"] = convert_pil_list_to_tensor(cached_condition["normal_imgs"])
152
-
153
- cached_condition['normal_imgs'] = self.encode_images(cached_condition["normal_imgs"])
154
-
155
- if "position_imgs" in cached_condition:
156
-
157
- if isinstance(cached_condition["position_imgs"], List):
158
- cached_condition["position_imgs"] = convert_pil_list_to_tensor(cached_condition["position_imgs"])
159
-
160
- cached_condition["position_imgs"] = self.encode_images(cached_condition["position_imgs"])
161
-
162
- if 'camera_info_gen' in cached_condition:
163
- camera_info = cached_condition['camera_info_gen'] # B,N
164
- if isinstance(camera_info, List):
165
- camera_info = torch.tensor(camera_info)
166
- camera_info = camera_info.to(device).to(torch.int64)
167
- cached_condition['camera_info_gen'] = camera_info
168
- if 'camera_info_ref' in cached_condition:
169
- camera_info = cached_condition['camera_info_ref'] # B,N
170
- if isinstance(camera_info, List):
171
- camera_info = torch.tensor(camera_info)
172
- camera_info = camera_info.to(device).to(torch.int64)
173
- cached_condition['camera_info_ref'] = camera_info
174
-
175
- cached_condition['ref_latents'] = ref_latents
176
-
177
- if guidance_scale > 1:
178
- negative_ref_latents = torch.zeros_like(cached_condition['ref_latents'])
179
- cached_condition['ref_latents'] = torch.cat([negative_ref_latents, cached_condition['ref_latents']])
180
- cached_condition['ref_scale'] = torch.as_tensor([0.0, 1.0]).to(cached_condition['ref_latents'])
181
- if "normal_imgs" in cached_condition:
182
- cached_condition['normal_imgs'] = torch.cat(
183
- (cached_condition['normal_imgs'], cached_condition['normal_imgs']))
184
-
185
- if "position_imgs" in cached_condition:
186
- cached_condition['position_imgs'] = torch.cat(
187
- (cached_condition['position_imgs'], cached_condition['position_imgs']))
188
-
189
- if 'position_maps' in cached_condition:
190
- cached_condition['position_maps'] = torch.cat(
191
- (cached_condition['position_maps'], cached_condition['position_maps']))
192
-
193
- if 'camera_info_gen' in cached_condition:
194
- cached_condition['camera_info_gen'] = torch.cat(
195
- (cached_condition['camera_info_gen'], cached_condition['camera_info_gen']))
196
- if 'camera_info_ref' in cached_condition:
197
- cached_condition['camera_info_ref'] = torch.cat(
198
- (cached_condition['camera_info_ref'], cached_condition['camera_info_ref']))
199
-
200
- prompt_embeds = self.unet.learned_text_clip_gen.repeat(num_images_per_prompt, 1, 1)
201
- negative_prompt_embeds = torch.zeros_like(prompt_embeds)
202
-
203
- latents: torch.Tensor = self.denoise(
204
- None,
205
- *args,
206
- cross_attention_kwargs=None,
207
- guidance_scale=guidance_scale,
208
- num_images_per_prompt=num_images_per_prompt,
209
- prompt_embeds=prompt_embeds,
210
- negative_prompt_embeds=negative_prompt_embeds,
211
- num_inference_steps=num_inference_steps,
212
- output_type='latent',
213
- width=width,
214
- height=height,
215
- **cached_condition
216
- ).images
217
-
218
- if not output_type == "latent":
219
- image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False)[0]
220
- else:
221
- image = latents
222
-
223
- image = self.image_processor.postprocess(image, output_type=output_type)
224
- if not return_dict:
225
- return (image,)
226
-
227
- return ImagePipelineOutput(images=image)
228
-
229
- def denoise(
230
- self,
231
- prompt: Union[str, List[str]] = None,
232
- height: Optional[int] = None,
233
- width: Optional[int] = None,
234
- num_inference_steps: int = 50,
235
- timesteps: List[int] = None,
236
- sigmas: List[float] = None,
237
- guidance_scale: float = 7.5,
238
- negative_prompt: Optional[Union[str, List[str]]] = None,
239
- num_images_per_prompt: Optional[int] = 1,
240
- eta: float = 0.0,
241
- generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
242
- latents: Optional[torch.Tensor] = None,
243
- prompt_embeds: Optional[torch.Tensor] = None,
244
- negative_prompt_embeds: Optional[torch.Tensor] = None,
245
- ip_adapter_image: Optional[PipelineImageInput] = None,
246
- ip_adapter_image_embeds: Optional[List[torch.Tensor]] = None,
247
- output_type: Optional[str] = "pil",
248
- return_dict: bool = True,
249
- cross_attention_kwargs: Optional[Dict[str, Any]] = None,
250
- guidance_rescale: float = 0.0,
251
- clip_skip: Optional[int] = None,
252
- callback_on_step_end: Optional[
253
- Union[Callable[[int, int, Dict], None], PipelineCallback, MultiPipelineCallbacks]
254
- ] = None,
255
- callback_on_step_end_tensor_inputs: List[str] = ["latents"],
256
- **kwargs,
257
- ):
258
- r"""
259
- The call function to the pipeline for generation.
260
-
261
- Args:
262
- prompt (`str` or `List[str]`, *optional*):
263
- The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`.
264
- height (`int`, *optional*, defaults to `self.unet.config.sample_size * self.vae_scale_factor`):
265
- The height in pixels of the generated image.
266
- width (`int`, *optional*, defaults to `self.unet.config.sample_size * self.vae_scale_factor`):
267
- The width in pixels of the generated image.
268
- num_inference_steps (`int`, *optional*, defaults to 50):
269
- The number of denoising steps. More denoising steps usually lead to a higher quality image at the
270
- expense of slower inference.
271
- timesteps (`List[int]`, *optional*):
272
- Custom timesteps to use for the denoising process with schedulers which support a `timesteps` argument
273
- in their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is
274
- passed will be used. Must be in descending order.
275
- sigmas (`List[float]`, *optional*):
276
- Custom sigmas to use for the denoising process with schedulers which support a `sigmas` argument in
277
- their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
278
- will be used.
279
- guidance_scale (`float`, *optional*, defaults to 7.5):
280
- A higher guidance scale value encourages the model to generate images closely linked to the text
281
- `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.
282
- negative_prompt (`str` or `List[str]`, *optional*):
283
- The prompt or prompts to guide what to not include in image generation. If not defined, you need to
284
- pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale < 1`).
285
- num_images_per_prompt (`int`, *optional*, defaults to 1):
286
- The number of images to generate per prompt.
287
- eta (`float`, *optional*, defaults to 0.0):
288
- Corresponds to parameter eta (η) from the [DDIM](https://arxiv.org/abs/2010.02502) paper. Only applies
289
- to the [`~schedulers.DDIMScheduler`], and is ignored in other schedulers.
290
- generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
291
- A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make
292
- generation deterministic.
293
- latents (`torch.Tensor`, *optional*):
294
- Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image
295
- generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
296
- tensor is generated by sampling using the supplied random `generator`.
297
- prompt_embeds (`torch.Tensor`, *optional*):
298
- Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not
299
- provided, text embeddings are generated from the `prompt` input argument.
300
- negative_prompt_embeds (`torch.Tensor`, *optional*):
301
- Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If
302
- not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
303
- ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters.
304
- ip_adapter_image_embeds (`List[torch.Tensor]`, *optional*):
305
- Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of
306
- IP-adapters. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should
307
- contain the negative image embedding if `do_classifier_free_guidance` is set to `True`. If not
308
- provided, embeddings are computed from the `ip_adapter_image` input argument.
309
- output_type (`str`, *optional*, defaults to `"pil"`):
310
- The output format of the generated image. Choose between `PIL.Image` or `np.array`.
311
- return_dict (`bool`, *optional*, defaults to `True`):
312
- Whether or not to return a [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] instead of a
313
- plain tuple.
314
- cross_attention_kwargs (`dict`, *optional*):
315
- A kwargs dictionary that if specified is passed along to the [`AttentionProcessor`] as defined in
316
- [`self.processor`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
317
- guidance_rescale (`float`, *optional*, defaults to 0.0):
318
- Guidance rescale factor from [Common Diffusion Noise Schedules and Sample Steps are
319
- Flawed](https://arxiv.org/pdf/2305.08891.pdf). Guidance rescale factor should fix overexposure when
320
- using zero terminal SNR.
321
- clip_skip (`int`, *optional*):
322
- Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that
323
- the output of the pre-final layer will be used for computing the prompt embeddings.
324
- callback_on_step_end (`Callable`, `PipelineCallback`, `MultiPipelineCallbacks`, *optional*):
325
- A function or a subclass of `PipelineCallback` or `MultiPipelineCallbacks` that is called at the end of
326
- each denoising step during the inference. with the following arguments: `callback_on_step_end(self:
327
- DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)`. `callback_kwargs` will include a
328
- list of all tensors as specified by `callback_on_step_end_tensor_inputs`.
329
- callback_on_step_end_tensor_inputs (`List`, *optional*):
330
- The list of tensor inputs for the `callback_on_step_end` function. The tensors specified in the list
331
- will be passed as `callback_kwargs` argument. You will only be able to include variables listed in the
332
- `._callback_tensor_inputs` attribute of your pipeline class.
333
-
334
- Examples:
335
-
336
- Returns:
337
- [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] or `tuple`:
338
- If `return_dict` is `True`, [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] is returned,
339
- otherwise a `tuple` is returned where the first element is a list with the generated images and the
340
- second element is a list of `bool`s indicating whether the corresponding generated image contains
341
- "not-safe-for-work" (nsfw) content.
342
- """
343
-
344
- callback = kwargs.pop("callback", None)
345
- callback_steps = kwargs.pop("callback_steps", None)
346
-
347
- if callback is not None:
348
- deprecate(
349
- "callback",
350
- "1.0.0",
351
- "Passing `callback` as an input argument to `__call__` is deprecated, consider using `callback_on_step_end`",
352
- )
353
- if callback_steps is not None:
354
- deprecate(
355
- "callback_steps",
356
- "1.0.0",
357
- "Passing `callback_steps` as an input argument to `__call__` is deprecated, consider using `callback_on_step_end`",
358
- )
359
-
360
- if isinstance(callback_on_step_end, (PipelineCallback, MultiPipelineCallbacks)):
361
- callback_on_step_end_tensor_inputs = callback_on_step_end.tensor_inputs
362
-
363
- # 0. Default height and width to unet
364
- height = height or self.unet.config.sample_size * self.vae_scale_factor
365
- width = width or self.unet.config.sample_size * self.vae_scale_factor
366
- # to deal with lora scaling and other possible forward hooks
367
-
368
- # 1. Check inputs. Raise error if not correct
369
- self.check_inputs(
370
- prompt,
371
- height,
372
- width,
373
- callback_steps,
374
- negative_prompt,
375
- prompt_embeds,
376
- negative_prompt_embeds,
377
- ip_adapter_image,
378
- ip_adapter_image_embeds,
379
- callback_on_step_end_tensor_inputs,
380
- )
381
-
382
- self._guidance_scale = guidance_scale
383
- self._guidance_rescale = guidance_rescale
384
- self._clip_skip = clip_skip
385
- self._cross_attention_kwargs = cross_attention_kwargs
386
- self._interrupt = False
387
-
388
- # 2. Define call parameters
389
- if prompt is not None and isinstance(prompt, str):
390
- batch_size = 1
391
- elif prompt is not None and isinstance(prompt, list):
392
- batch_size = len(prompt)
393
- else:
394
- batch_size = prompt_embeds.shape[0]
395
-
396
- device = self._execution_device
397
-
398
- # 3. Encode input prompt
399
- lora_scale = (
400
- self.cross_attention_kwargs.get("scale", None) if self.cross_attention_kwargs is not None else None
401
- )
402
-
403
- prompt_embeds, negative_prompt_embeds = self.encode_prompt(
404
- prompt,
405
- device,
406
- num_images_per_prompt,
407
- self.do_classifier_free_guidance,
408
- negative_prompt,
409
- prompt_embeds=prompt_embeds,
410
- negative_prompt_embeds=negative_prompt_embeds,
411
- lora_scale=lora_scale,
412
- clip_skip=self.clip_skip,
413
- )
414
-
415
- # For classifier free guidance, we need to do two forward passes.
416
- # Here we concatenate the unconditional and text embeddings into a single batch
417
- # to avoid doing two forward passes
418
- if self.do_classifier_free_guidance:
419
- prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds])
420
-
421
- if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
422
- image_embeds = self.prepare_ip_adapter_image_embeds(
423
- ip_adapter_image,
424
- ip_adapter_image_embeds,
425
- device,
426
- batch_size * num_images_per_prompt,
427
- self.do_classifier_free_guidance,
428
- )
429
-
430
- # 4. Prepare timesteps
431
- timesteps, num_inference_steps = retrieve_timesteps(
432
- self.scheduler, num_inference_steps, device, timesteps, sigmas
433
- )
434
- assert num_images_per_prompt == 1
435
- # 5. Prepare latent variables
436
- num_channels_latents = self.unet.config.in_channels
437
- latents = self.prepare_latents(
438
- batch_size * kwargs['num_in_batch'], # num_images_per_prompt,
439
- num_channels_latents,
440
- height,
441
- width,
442
- prompt_embeds.dtype,
443
- device,
444
- generator,
445
- latents,
446
- )
447
-
448
- # 6. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline
449
- extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)
450
-
451
- # 6.1 Add image embeds for IP-Adapter
452
- added_cond_kwargs = (
453
- {"image_embeds": image_embeds}
454
- if (ip_adapter_image is not None or ip_adapter_image_embeds is not None)
455
- else None
456
- )
457
-
458
- # 6.2 Optionally get Guidance Scale Embedding
459
- timestep_cond = None
460
- if self.unet.config.time_cond_proj_dim is not None:
461
- guidance_scale_tensor = torch.tensor(self.guidance_scale - 1).repeat(batch_size * num_images_per_prompt)
462
- timestep_cond = self.get_guidance_scale_embedding(
463
- guidance_scale_tensor, embedding_dim=self.unet.config.time_cond_proj_dim
464
- ).to(device=device, dtype=latents.dtype)
465
-
466
- # 7. Denoising loop
467
- num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order
468
- self._num_timesteps = len(timesteps)
469
- with self.progress_bar(total=num_inference_steps) as progress_bar:
470
- for i, t in enumerate(timesteps):
471
- if self.interrupt:
472
- continue
473
-
474
- # expand the latents if we are doing classifier free guidance
475
- latents = rearrange(latents, '(b n) c h w -> b n c h w', n=kwargs['num_in_batch'])
476
- latent_model_input = torch.cat([latents] * 2) if self.do_classifier_free_guidance else latents
477
- latent_model_input = rearrange(latent_model_input, 'b n c h w -> (b n) c h w')
478
- latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
479
- latent_model_input = rearrange(latent_model_input, '(b n) c h w ->b n c h w', n=kwargs['num_in_batch'])
480
-
481
- # predict the noise residual
482
-
483
- noise_pred = self.unet(
484
- latent_model_input,
485
- t,
486
- encoder_hidden_states=prompt_embeds,
487
- timestep_cond=timestep_cond,
488
- cross_attention_kwargs=self.cross_attention_kwargs,
489
- added_cond_kwargs=added_cond_kwargs,
490
- return_dict=False, **kwargs
491
- )[0]
492
- latents = rearrange(latents, 'b n c h w -> (b n) c h w')
493
- # perform guidance
494
- if self.do_classifier_free_guidance:
495
- noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
496
- noise_pred = noise_pred_uncond + self.guidance_scale * (noise_pred_text - noise_pred_uncond)
497
-
498
- if self.do_classifier_free_guidance and self.guidance_rescale > 0.0:
499
- # Based on 3.4. in https://arxiv.org/pdf/2305.08891.pdf
500
- noise_pred = rescale_noise_cfg(noise_pred, noise_pred_text, guidance_rescale=self.guidance_rescale)
501
-
502
- # compute the previous noisy sample x_t -> x_t-1
503
- latents = \
504
- self.scheduler.step(noise_pred, t, latents[:, :num_channels_latents, :, :], **extra_step_kwargs,
505
- return_dict=False)[0]
506
-
507
- if callback_on_step_end is not None:
508
- callback_kwargs = {}
509
- for k in callback_on_step_end_tensor_inputs:
510
- callback_kwargs[k] = locals()[k]
511
- callback_outputs = callback_on_step_end(self, i, t, callback_kwargs)
512
-
513
- latents = callback_outputs.pop("latents", latents)
514
- prompt_embeds = callback_outputs.pop("prompt_embeds", prompt_embeds)
515
- negative_prompt_embeds = callback_outputs.pop("negative_prompt_embeds", negative_prompt_embeds)
516
-
517
- # call the callback, if provided
518
- if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0):
519
- progress_bar.update()
520
- if callback is not None and i % callback_steps == 0:
521
- step_idx = i // getattr(self.scheduler, "order", 1)
522
- callback(step_idx, t, latents)
523
-
524
- if not output_type == "latent":
525
- image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False, generator=generator)[
526
- 0
527
- ]
528
- image, has_nsfw_concept = self.run_safety_checker(image, device, prompt_embeds.dtype)
529
- else:
530
- image = latents
531
- has_nsfw_concept = None
532
-
533
- if has_nsfw_concept is None:
534
- do_denormalize = [True] * image.shape[0]
535
- else:
536
- do_denormalize = [not has_nsfw for has_nsfw in has_nsfw_concept]
537
-
538
- image = self.image_processor.postprocess(image, output_type=output_type, do_denormalize=do_denormalize)
539
-
540
- # Offload all models
541
- self.maybe_free_model_hooks()
542
-
543
- if not return_dict:
544
- return (image, has_nsfw_concept)
545
-
546
- return StableDiffusionPipelineOutput(images=image, nsfw_content_detected=has_nsfw_concept)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
pipelines (1).py DELETED
@@ -1,765 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
- import copy
16
- import importlib
17
- import inspect
18
- import os
19
- from typing import List, Optional, Union
20
-
21
- import numpy as np
22
- import torch
23
- import trimesh
24
- import yaml
25
- from PIL import Image
26
- from diffusers.utils.torch_utils import randn_tensor
27
- from diffusers.utils.import_utils import is_accelerate_version, is_accelerate_available
28
- from tqdm import tqdm
29
-
30
- from .models.autoencoders import ShapeVAE
31
- from .models.autoencoders import SurfaceExtractors
32
- from .utils import logger, synchronize_timer, smart_load_model
33
-
34
-
35
- def retrieve_timesteps(
36
- scheduler,
37
- num_inference_steps: Optional[int] = None,
38
- device: Optional[Union[str, torch.device]] = None,
39
- timesteps: Optional[List[int]] = None,
40
- sigmas: Optional[List[float]] = None,
41
- **kwargs,
42
- ):
43
- """
44
- Calls the scheduler's `set_timesteps` method and retrieves timesteps from the scheduler after the call. Handles
45
- custom timesteps. Any kwargs will be supplied to `scheduler.set_timesteps`.
46
-
47
- Args:
48
- scheduler (`SchedulerMixin`):
49
- The scheduler to get timesteps from.
50
- num_inference_steps (`int`):
51
- The number of diffusion steps used when generating samples with a pre-trained model. If used, `timesteps`
52
- must be `None`.
53
- device (`str` or `torch.device`, *optional*):
54
- The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
55
- timesteps (`List[int]`, *optional*):
56
- Custom timesteps used to override the timestep spacing strategy of the scheduler. If `timesteps` is passed,
57
- `num_inference_steps` and `sigmas` must be `None`.
58
- sigmas (`List[float]`, *optional*):
59
- Custom sigmas used to override the timestep spacing strategy of the scheduler. If `sigmas` is passed,
60
- `num_inference_steps` and `timesteps` must be `None`.
61
-
62
- Returns:
63
- `Tuple[torch.Tensor, int]`: A tuple where the first element is the timestep schedule from the scheduler and the
64
- second element is the number of inference steps.
65
- """
66
- if timesteps is not None and sigmas is not None:
67
- raise ValueError("Only one of `timesteps` or `sigmas` can be passed. Please choose one to set custom values")
68
- if timesteps is not None:
69
- accepts_timesteps = "timesteps" in set(inspect.signature(scheduler.set_timesteps).parameters.keys())
70
- if not accepts_timesteps:
71
- raise ValueError(
72
- f"The current scheduler class {scheduler.__class__}'s `set_timesteps` does not support custom"
73
- f" timestep schedules. Please check whether you are using the correct scheduler."
74
- )
75
- scheduler.set_timesteps(timesteps=timesteps, device=device, **kwargs)
76
- timesteps = scheduler.timesteps
77
- num_inference_steps = len(timesteps)
78
- elif sigmas is not None:
79
- accept_sigmas = "sigmas" in set(inspect.signature(scheduler.set_timesteps).parameters.keys())
80
- if not accept_sigmas:
81
- raise ValueError(
82
- f"The current scheduler class {scheduler.__class__}'s `set_timesteps` does not support custom"
83
- f" sigmas schedules. Please check whether you are using the correct scheduler."
84
- )
85
- scheduler.set_timesteps(sigmas=sigmas, device=device, **kwargs)
86
- timesteps = scheduler.timesteps
87
- num_inference_steps = len(timesteps)
88
- else:
89
- scheduler.set_timesteps(num_inference_steps, device=device, **kwargs)
90
- timesteps = scheduler.timesteps
91
- return timesteps, num_inference_steps
92
-
93
-
94
- @synchronize_timer('Export to trimesh')
95
- def export_to_trimesh(mesh_output):
96
- if isinstance(mesh_output, list):
97
- outputs = []
98
- for mesh in mesh_output:
99
- if mesh is None:
100
- outputs.append(None)
101
- else:
102
- mesh.mesh_f = mesh.mesh_f[:, ::-1]
103
- mesh_output = trimesh.Trimesh(mesh.mesh_v, mesh.mesh_f)
104
- outputs.append(mesh_output)
105
- return outputs
106
- else:
107
- mesh_output.mesh_f = mesh_output.mesh_f[:, ::-1]
108
- mesh_output = trimesh.Trimesh(mesh_output.mesh_v, mesh_output.mesh_f)
109
- return mesh_output
110
-
111
-
112
- def get_obj_from_str(string, reload=False):
113
- module, cls = string.rsplit(".", 1)
114
- if reload:
115
- module_imp = importlib.import_module(module)
116
- importlib.reload(module_imp)
117
- return getattr(importlib.import_module(module, package=None), cls)
118
-
119
-
120
- def instantiate_from_config(config, **kwargs):
121
- if "target" not in config:
122
- raise KeyError("Expected key `target` to instantiate.")
123
- cls = get_obj_from_str(config["target"])
124
- params = config.get("params", dict())
125
- kwargs.update(params)
126
- instance = cls(**kwargs)
127
- return instance
128
-
129
-
130
- class Hunyuan3DDiTPipeline:
131
- model_cpu_offload_seq = "conditioner->model->vae"
132
- _exclude_from_cpu_offload = []
133
-
134
- @classmethod
135
- @synchronize_timer('Hunyuan3DDiTPipeline Model Loading')
136
- def from_single_file(
137
- cls,
138
- ckpt_path,
139
- config_path,
140
- device='cuda',
141
- dtype=torch.float16,
142
- use_safetensors=None,
143
- **kwargs,
144
- ):
145
- # load config
146
- with open(config_path, 'r') as f:
147
- config = yaml.safe_load(f)
148
-
149
- # load ckpt
150
- if use_safetensors:
151
- ckpt_path = ckpt_path.replace('.ckpt', '.safetensors')
152
- if not os.path.exists(ckpt_path):
153
- raise FileNotFoundError(f"Model file {ckpt_path} not found")
154
- logger.info(f"Loading model from {ckpt_path}")
155
-
156
- if use_safetensors:
157
- # parse safetensors
158
- import safetensors.torch
159
- safetensors_ckpt = safetensors.torch.load_file(ckpt_path, device='cpu')
160
- ckpt = {}
161
- for key, value in safetensors_ckpt.items():
162
- model_name = key.split('.')[0]
163
- new_key = key[len(model_name) + 1:]
164
- if model_name not in ckpt:
165
- ckpt[model_name] = {}
166
- ckpt[model_name][new_key] = value
167
- else:
168
- ckpt = torch.load(ckpt_path, map_location='cpu', weights_only=True)
169
- # load model
170
- model = instantiate_from_config(config['model'])
171
- model.load_state_dict(ckpt['model'])
172
- vae = instantiate_from_config(config['vae'])
173
- vae.load_state_dict(ckpt['vae'])
174
- conditioner = instantiate_from_config(config['conditioner'])
175
- if 'conditioner' in ckpt:
176
- conditioner.load_state_dict(ckpt['conditioner'])
177
- image_processor = instantiate_from_config(config['image_processor'])
178
- scheduler = instantiate_from_config(config['scheduler'])
179
-
180
- model_kwargs = dict(
181
- vae=vae,
182
- model=model,
183
- scheduler=scheduler,
184
- conditioner=conditioner,
185
- image_processor=image_processor,
186
- device=device,
187
- dtype=dtype,
188
- )
189
- model_kwargs.update(kwargs)
190
-
191
- return cls(
192
- **model_kwargs
193
- )
194
-
195
- @classmethod
196
- def from_pretrained(
197
- cls,
198
- model_path,
199
- device='cuda',
200
- dtype=torch.float16,
201
- use_safetensors=True,
202
- variant='fp16',
203
- subfolder='hunyuan3d-dit-v2-0',
204
- **kwargs,
205
- ):
206
- kwargs['from_pretrained_kwargs'] = dict(
207
- model_path=model_path,
208
- subfolder=subfolder,
209
- use_safetensors=use_safetensors,
210
- variant=variant,
211
- dtype=dtype,
212
- device=device,
213
- )
214
- config_path, ckpt_path = smart_load_model(
215
- model_path,
216
- subfolder=subfolder,
217
- use_safetensors=use_safetensors,
218
- variant=variant
219
- )
220
- return cls.from_single_file(
221
- ckpt_path,
222
- config_path,
223
- device=device,
224
- dtype=dtype,
225
- use_safetensors=use_safetensors,
226
- **kwargs
227
- )
228
-
229
- def __init__(
230
- self,
231
- vae,
232
- model,
233
- scheduler,
234
- conditioner,
235
- image_processor,
236
- device='cuda',
237
- dtype=torch.float16,
238
- **kwargs
239
- ):
240
- self.vae = vae
241
- self.model = model
242
- self.scheduler = scheduler
243
- self.conditioner = conditioner
244
- self.image_processor = image_processor
245
- self.kwargs = kwargs
246
- self.to(device, dtype)
247
-
248
- def compile(self):
249
- self.vae = torch.compile(self.vae)
250
- self.model = torch.compile(self.model)
251
- self.conditioner = torch.compile(self.conditioner)
252
-
253
- def enable_flashvdm(
254
- self,
255
- enabled: bool = True,
256
- adaptive_kv_selection=True,
257
- topk_mode='mean',
258
- mc_algo='dmc',
259
- replace_vae=True,
260
- ):
261
- if enabled:
262
- model_path = self.kwargs['from_pretrained_kwargs']['model_path']
263
- turbo_vae_mapping = {
264
- 'Hunyuan3D-2': ('tencent/Hunyuan3D-2', 'hunyuan3d-vae-v2-0-turbo'),
265
- 'Hunyuan3D-2mv': ('tencent/Hunyuan3D-2', 'hunyuan3d-vae-v2-0-turbo'),
266
- 'Hunyuan3D-2mini': ('tencent/Hunyuan3D-2mini', 'hunyuan3d-vae-v2-mini-turbo'),
267
- }
268
- model_name = model_path.split('/')[-1]
269
- if replace_vae and model_name in turbo_vae_mapping:
270
- model_path, subfolder = turbo_vae_mapping[model_name]
271
- self.vae = ShapeVAE.from_pretrained(
272
- model_path, subfolder=subfolder,
273
- use_safetensors=self.kwargs['from_pretrained_kwargs']['use_safetensors'],
274
- device=self.device,
275
- )
276
- self.vae.enable_flashvdm_decoder(
277
- enabled=enabled,
278
- adaptive_kv_selection=adaptive_kv_selection,
279
- topk_mode=topk_mode,
280
- mc_algo=mc_algo
281
- )
282
- else:
283
- model_path = self.kwargs['from_pretrained_kwargs']['model_path']
284
- vae_mapping = {
285
- 'Hunyuan3D-2': ('tencent/Hunyuan3D-2', 'hunyuan3d-vae-v2-0'),
286
- 'Hunyuan3D-2mv': ('tencent/Hunyuan3D-2', 'hunyuan3d-vae-v2-0'),
287
- 'Hunyuan3D-2mini': ('tencent/Hunyuan3D-2mini', 'hunyuan3d-vae-v2-mini'),
288
- }
289
- model_name = model_path.split('/')[-1]
290
- if model_name in vae_mapping:
291
- model_path, subfolder = vae_mapping[model_name]
292
- self.vae = ShapeVAE.from_pretrained(model_path, subfolder=subfolder)
293
- self.vae.enable_flashvdm_decoder(enabled=False)
294
-
295
- def to(self, device=None, dtype=None):
296
- if dtype is not None:
297
- self.dtype = dtype
298
- self.vae.to(dtype=dtype)
299
- self.model.to(dtype=dtype)
300
- self.conditioner.to(dtype=dtype)
301
- if device is not None:
302
- self.device = torch.device(device)
303
- self.vae.to(device)
304
- self.model.to(device)
305
- self.conditioner.to(device)
306
-
307
- @property
308
- def _execution_device(self):
309
- r"""
310
- Returns the device on which the pipeline's models will be executed. After calling
311
- [`~DiffusionPipeline.enable_sequential_cpu_offload`] the execution device can only be inferred from
312
- Accelerate's module hooks.
313
- """
314
- for name, model in self.components.items():
315
- if not isinstance(model, torch.nn.Module) or name in self._exclude_from_cpu_offload:
316
- continue
317
-
318
- if not hasattr(model, "_hf_hook"):
319
- return self.device
320
- for module in model.modules():
321
- if (
322
- hasattr(module, "_hf_hook")
323
- and hasattr(module._hf_hook, "execution_device")
324
- and module._hf_hook.execution_device is not None
325
- ):
326
- return torch.device(module._hf_hook.execution_device)
327
- return self.device
328
-
329
- def enable_model_cpu_offload(self, gpu_id: Optional[int] = None, device: Union[torch.device, str] = "cuda"):
330
- r"""
331
- Offloads all models to CPU using accelerate, reducing memory usage with a low impact on performance. Compared
332
- to `enable_sequential_cpu_offload`, this method moves one whole model at a time to the GPU when its `forward`
333
- method is called, and the model remains in GPU until the next model runs. Memory savings are lower than with
334
- `enable_sequential_cpu_offload`, but performance is much better due to the iterative execution of the `unet`.
335
-
336
- Arguments:
337
- gpu_id (`int`, *optional*):
338
- The ID of the accelerator that shall be used in inference. If not specified, it will default to 0.
339
- device (`torch.Device` or `str`, *optional*, defaults to "cuda"):
340
- The PyTorch device type of the accelerator that shall be used in inference. If not specified, it will
341
- default to "cuda".
342
- """
343
- if self.model_cpu_offload_seq is None:
344
- raise ValueError(
345
- "Model CPU offload cannot be enabled because no `model_cpu_offload_seq` class attribute is set."
346
- )
347
-
348
- if is_accelerate_available() and is_accelerate_version(">=", "0.17.0.dev0"):
349
- from accelerate import cpu_offload_with_hook
350
- else:
351
- raise ImportError("`enable_model_cpu_offload` requires `accelerate v0.17.0` or higher.")
352
-
353
- torch_device = torch.device(device)
354
- device_index = torch_device.index
355
-
356
- if gpu_id is not None and device_index is not None:
357
- raise ValueError(
358
- f"You have passed both `gpu_id`={gpu_id} and an index as part of the passed device `device`={device}"
359
- f"Cannot pass both. Please make sure to either not define `gpu_id` or not pass the index as part of the device: `device`={torch_device.type}"
360
- )
361
-
362
- # _offload_gpu_id should be set to passed gpu_id (or id in passed `device`) or default to previously set id or default to 0
363
- self._offload_gpu_id = gpu_id or torch_device.index or getattr(self, "_offload_gpu_id", 0)
364
-
365
- device_type = torch_device.type
366
- device = torch.device(f"{device_type}:{self._offload_gpu_id}")
367
-
368
- if self.device.type != "cpu":
369
- self.to("cpu")
370
- device_mod = getattr(torch, self.device.type, None)
371
- if hasattr(device_mod, "empty_cache") and device_mod.is_available():
372
- device_mod.empty_cache() # otherwise we don't see the memory savings (but they probably exist)
373
-
374
- all_model_components = {k: v for k, v in self.components.items() if isinstance(v, torch.nn.Module)}
375
-
376
- self._all_hooks = []
377
- hook = None
378
- for model_str in self.model_cpu_offload_seq.split("->"):
379
- model = all_model_components.pop(model_str, None)
380
- if not isinstance(model, torch.nn.Module):
381
- continue
382
-
383
- _, hook = cpu_offload_with_hook(model, device, prev_module_hook=hook)
384
- self._all_hooks.append(hook)
385
-
386
- # CPU offload models that are not in the seq chain unless they are explicitly excluded
387
- # these models will stay on CPU until maybe_free_model_hooks is called
388
- # some models cannot be in the seq chain because they are iteratively called, such as controlnet
389
- for name, model in all_model_components.items():
390
- if not isinstance(model, torch.nn.Module):
391
- continue
392
-
393
- if name in self._exclude_from_cpu_offload:
394
- model.to(device)
395
- else:
396
- _, hook = cpu_offload_with_hook(model, device)
397
- self._all_hooks.append(hook)
398
-
399
- def maybe_free_model_hooks(self):
400
- r"""
401
- Function that offloads all components, removes all model hooks that were added when using
402
- `enable_model_cpu_offload` and then applies them again. In case the model has not been offloaded this function
403
- is a no-op. Make sure to add this function to the end of the `__call__` function of your pipeline so that it
404
- functions correctly when applying enable_model_cpu_offload.
405
- """
406
- if not hasattr(self, "_all_hooks") or len(self._all_hooks) == 0:
407
- # `enable_model_cpu_offload` has not be called, so silently do nothing
408
- return
409
-
410
- for hook in self._all_hooks:
411
- # offload model and remove hook from model
412
- hook.offload()
413
- hook.remove()
414
-
415
- # make sure the model is in the same state as before calling it
416
- self.enable_model_cpu_offload()
417
-
418
- @synchronize_timer('Encode cond')
419
- def encode_cond(self, image, additional_cond_inputs, do_classifier_free_guidance, dual_guidance):
420
- bsz = image.shape[0]
421
- cond = self.conditioner(image=image, **additional_cond_inputs)
422
-
423
- if do_classifier_free_guidance:
424
- un_cond = self.conditioner.unconditional_embedding(bsz, **additional_cond_inputs)
425
-
426
- if dual_guidance:
427
- un_cond_drop_main = copy.deepcopy(un_cond)
428
- un_cond_drop_main['additional'] = cond['additional']
429
-
430
- def cat_recursive(a, b, c):
431
- if isinstance(a, torch.Tensor):
432
- return torch.cat([a, b, c], dim=0).to(self.dtype)
433
- out = {}
434
- for k in a.keys():
435
- out[k] = cat_recursive(a[k], b[k], c[k])
436
- return out
437
-
438
- cond = cat_recursive(cond, un_cond_drop_main, un_cond)
439
- else:
440
- def cat_recursive(a, b):
441
- if isinstance(a, torch.Tensor):
442
- return torch.cat([a, b], dim=0).to(self.dtype)
443
- out = {}
444
- for k in a.keys():
445
- out[k] = cat_recursive(a[k], b[k])
446
- return out
447
-
448
- cond = cat_recursive(cond, un_cond)
449
- return cond
450
-
451
- def prepare_extra_step_kwargs(self, generator, eta):
452
- # prepare extra kwargs for the scheduler step, since not all schedulers have the same signature
453
- # eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.
454
- # eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502
455
- # and should be between [0, 1]
456
-
457
- accepts_eta = "eta" in set(inspect.signature(self.scheduler.step).parameters.keys())
458
- extra_step_kwargs = {}
459
- if accepts_eta:
460
- extra_step_kwargs["eta"] = eta
461
-
462
- # check if the scheduler accepts generator
463
- accepts_generator = "generator" in set(inspect.signature(self.scheduler.step).parameters.keys())
464
- if accepts_generator:
465
- extra_step_kwargs["generator"] = generator
466
- return extra_step_kwargs
467
-
468
- def prepare_latents(self, batch_size, dtype, device, generator, latents=None):
469
- shape = (batch_size, *self.vae.latent_shape)
470
- if isinstance(generator, list) and len(generator) != batch_size:
471
- raise ValueError(
472
- f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
473
- f" size of {batch_size}. Make sure the batch size matches the length of the generators."
474
- )
475
-
476
- if latents is None:
477
- latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
478
- else:
479
- latents = latents.to(device)
480
-
481
- # scale the initial noise by the standard deviation required by the scheduler
482
- latents = latents * getattr(self.scheduler, 'init_noise_sigma', 1.0)
483
- return latents
484
-
485
- def prepare_image(self, image) -> dict:
486
- if isinstance(image, str) and not os.path.exists(image):
487
- raise FileNotFoundError(f"Couldn't find image at path {image}")
488
-
489
- if not isinstance(image, list):
490
- image = [image]
491
-
492
- outputs = []
493
- for img in image:
494
- output = self.image_processor(img)
495
- outputs.append(output)
496
-
497
- cond_input = {k: [] for k in outputs[0].keys()}
498
- for output in outputs:
499
- for key, value in output.items():
500
- cond_input[key].append(value)
501
- for key, value in cond_input.items():
502
- if isinstance(value[0], torch.Tensor):
503
- cond_input[key] = torch.cat(value, dim=0)
504
-
505
- return cond_input
506
-
507
- def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
508
- """
509
- See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
510
-
511
- Args:
512
- timesteps (`torch.Tensor`):
513
- generate embedding vectors at these timesteps
514
- embedding_dim (`int`, *optional*, defaults to 512):
515
- dimension of the embeddings to generate
516
- dtype:
517
- data type of the generated embeddings
518
-
519
- Returns:
520
- `torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
521
- """
522
- assert len(w.shape) == 1
523
- w = w * 1000.0
524
-
525
- half_dim = embedding_dim // 2
526
- emb = torch.log(torch.tensor(10000.0)) / (half_dim - 1)
527
- emb = torch.exp(torch.arange(half_dim, dtype=dtype) * -emb)
528
- emb = w.to(dtype)[:, None] * emb[None, :]
529
- emb = torch.cat([torch.sin(emb), torch.cos(emb)], dim=1)
530
- if embedding_dim % 2 == 1: # zero pad
531
- emb = torch.nn.functional.pad(emb, (0, 1))
532
- assert emb.shape == (w.shape[0], embedding_dim)
533
- return emb
534
-
535
- def set_surface_extractor(self, mc_algo):
536
- if mc_algo is None:
537
- return
538
- logger.info('The parameters `mc_algo` is deprecated, and will be removed in future versions.\n'
539
- 'Please use: \n'
540
- 'from hy3dgen.shapegen.models.autoencoders import SurfaceExtractors\n'
541
- 'pipeline.vae.surface_extractor = SurfaceExtractors[mc_algo]() instead\n')
542
- if mc_algo not in SurfaceExtractors.keys():
543
- raise ValueError(f"Unknown mc_algo {mc_algo}")
544
- self.vae.surface_extractor = SurfaceExtractors[mc_algo]()
545
-
546
- @torch.no_grad()
547
- def __call__(
548
- self,
549
- image: Union[str, List[str], Image.Image] = None,
550
- num_inference_steps: int = 50,
551
- timesteps: List[int] = None,
552
- sigmas: List[float] = None,
553
- eta: float = 0.0,
554
- guidance_scale: float = 7.5,
555
- dual_guidance_scale: float = 10.5,
556
- dual_guidance: bool = True,
557
- generator=None,
558
- box_v=1.01,
559
- octree_resolution=384,
560
- mc_level=-1 / 512,
561
- num_chunks=8000,
562
- mc_algo=None,
563
- output_type: Optional[str] = "trimesh",
564
- enable_pbar=True,
565
- **kwargs,
566
- ) -> List[List[trimesh.Trimesh]]:
567
- callback = kwargs.pop("callback", None)
568
- callback_steps = kwargs.pop("callback_steps", None)
569
-
570
- self.set_surface_extractor(mc_algo)
571
-
572
- device = self.device
573
- dtype = self.dtype
574
- do_classifier_free_guidance = guidance_scale >= 0 and \
575
- getattr(self.model, 'guidance_cond_proj_dim', None) is None
576
- dual_guidance = dual_guidance_scale >= 0 and dual_guidance
577
-
578
- cond_inputs = self.prepare_image(image)
579
- image = cond_inputs.pop('image')
580
- cond = self.encode_cond(
581
- image=image,
582
- additional_cond_inputs=cond_inputs,
583
- do_classifier_free_guidance=do_classifier_free_guidance,
584
- dual_guidance=False,
585
- )
586
- batch_size = image.shape[0]
587
-
588
- t_dtype = torch.long
589
- timesteps, num_inference_steps = retrieve_timesteps(
590
- self.scheduler, num_inference_steps, device, timesteps, sigmas)
591
-
592
- latents = self.prepare_latents(batch_size, dtype, device, generator)
593
- extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)
594
-
595
- guidance_cond = None
596
- if getattr(self.model, 'guidance_cond_proj_dim', None) is not None:
597
- logger.info('Using lcm guidance scale')
598
- guidance_scale_tensor = torch.tensor(guidance_scale - 1).repeat(batch_size)
599
- guidance_cond = self.get_guidance_scale_embedding(
600
- guidance_scale_tensor, embedding_dim=self.model.guidance_cond_proj_dim
601
- ).to(device=device, dtype=latents.dtype)
602
- with synchronize_timer('Diffusion Sampling'):
603
- for i, t in enumerate(tqdm(timesteps, disable=not enable_pbar, desc="Diffusion Sampling:", leave=False)):
604
- # expand the latents if we are doing classifier free guidance
605
- if do_classifier_free_guidance:
606
- latent_model_input = torch.cat([latents] * (3 if dual_guidance else 2))
607
- else:
608
- latent_model_input = latents
609
- latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
610
-
611
- # predict the noise residual
612
- timestep_tensor = torch.tensor([t], dtype=t_dtype, device=device)
613
- timestep_tensor = timestep_tensor.expand(latent_model_input.shape[0])
614
- noise_pred = self.model(latent_model_input, timestep_tensor, cond, guidance_cond=guidance_cond)
615
-
616
- # no drop, drop clip, all drop
617
- if do_classifier_free_guidance:
618
- if dual_guidance:
619
- noise_pred_clip, noise_pred_dino, noise_pred_uncond = noise_pred.chunk(3)
620
- noise_pred = (
621
- noise_pred_uncond
622
- + guidance_scale * (noise_pred_clip - noise_pred_dino)
623
- + dual_guidance_scale * (noise_pred_dino - noise_pred_uncond)
624
- )
625
- else:
626
- noise_pred_cond, noise_pred_uncond = noise_pred.chunk(2)
627
- noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_cond - noise_pred_uncond)
628
-
629
- # compute the previous noisy sample x_t -> x_t-1
630
- outputs = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs)
631
- latents = outputs.prev_sample
632
-
633
- if callback is not None and i % callback_steps == 0:
634
- step_idx = i // getattr(self.scheduler, "order", 1)
635
- callback(step_idx, t, outputs)
636
-
637
- return self._export(
638
- latents,
639
- output_type,
640
- box_v, mc_level, num_chunks, octree_resolution, mc_algo,
641
- )
642
-
643
- def _export(
644
- self,
645
- latents,
646
- output_type='trimesh',
647
- box_v=1.01,
648
- mc_level=0.0,
649
- num_chunks=20000,
650
- octree_resolution=256,
651
- mc_algo='mc',
652
- enable_pbar=True
653
- ):
654
- if not output_type == "latent":
655
- latents = 1. / self.vae.scale_factor * latents
656
- latents = self.vae(latents)
657
- outputs = self.vae.latents2mesh(
658
- latents,
659
- bounds=box_v,
660
- mc_level=mc_level,
661
- num_chunks=num_chunks,
662
- octree_resolution=octree_resolution,
663
- mc_algo=mc_algo,
664
- enable_pbar=enable_pbar,
665
- )
666
- else:
667
- outputs = latents
668
-
669
- if output_type == 'trimesh':
670
- outputs = export_to_trimesh(outputs)
671
-
672
- return outputs
673
-
674
-
675
- class Hunyuan3DDiTFlowMatchingPipeline(Hunyuan3DDiTPipeline):
676
-
677
- @torch.inference_mode()
678
- def __call__(
679
- self,
680
- image: Union[str, List[str], Image.Image, dict, List[dict]] = None,
681
- num_inference_steps: int = 50,
682
- timesteps: List[int] = None,
683
- sigmas: List[float] = None,
684
- eta: float = 0.0,
685
- guidance_scale: float = 5.0,
686
- generator=None,
687
- box_v=1.01,
688
- octree_resolution=384,
689
- mc_level=0.0,
690
- mc_algo=None,
691
- num_chunks=8000,
692
- output_type: Optional[str] = "trimesh",
693
- enable_pbar=True,
694
- **kwargs,
695
- ) -> List[List[trimesh.Trimesh]]:
696
- callback = kwargs.pop("callback", None)
697
- callback_steps = kwargs.pop("callback_steps", None)
698
-
699
- self.set_surface_extractor(mc_algo)
700
-
701
- device = self.device
702
- dtype = self.dtype
703
- do_classifier_free_guidance = guidance_scale >= 0 and not (
704
- hasattr(self.model, 'guidance_embed') and
705
- self.model.guidance_embed is True
706
- )
707
-
708
- cond_inputs = self.prepare_image(image)
709
- image = cond_inputs.pop('image')
710
- cond = self.encode_cond(
711
- image=image,
712
- additional_cond_inputs=cond_inputs,
713
- do_classifier_free_guidance=do_classifier_free_guidance,
714
- dual_guidance=False,
715
- )
716
- batch_size = image.shape[0]
717
-
718
- # 5. Prepare timesteps
719
- # NOTE: this is slightly different from common usage, we start from 0.
720
- sigmas = np.linspace(0, 1, num_inference_steps) if sigmas is None else sigmas
721
- timesteps, num_inference_steps = retrieve_timesteps(
722
- self.scheduler,
723
- num_inference_steps,
724
- device,
725
- sigmas=sigmas,
726
- )
727
- latents = self.prepare_latents(batch_size, dtype, device, generator)
728
-
729
- guidance = None
730
- if hasattr(self.model, 'guidance_embed') and \
731
- self.model.guidance_embed is True:
732
- guidance = torch.tensor([guidance_scale] * batch_size, device=device, dtype=dtype)
733
- # logger.info(f'Using guidance embed with scale {guidance_scale}')
734
-
735
- with synchronize_timer('Diffusion Sampling'):
736
- for i, t in enumerate(tqdm(timesteps, disable=not enable_pbar, desc="Diffusion Sampling:")):
737
- # expand the latents if we are doing classifier free guidance
738
- if do_classifier_free_guidance:
739
- latent_model_input = torch.cat([latents] * 2)
740
- else:
741
- latent_model_input = latents
742
-
743
- # NOTE: we assume model get timesteps ranged from 0 to 1
744
- timestep = t.expand(latent_model_input.shape[0]).to(
745
- latents.dtype) / self.scheduler.config.num_train_timesteps
746
- noise_pred = self.model(latent_model_input, timestep, cond, guidance=guidance)
747
-
748
- if do_classifier_free_guidance:
749
- noise_pred_cond, noise_pred_uncond = noise_pred.chunk(2)
750
- noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_cond - noise_pred_uncond)
751
-
752
- # compute the previous noisy sample x_t -> x_t-1
753
- outputs = self.scheduler.step(noise_pred, t, latents)
754
- latents = outputs.prev_sample
755
-
756
- if callback is not None and i % callback_steps == 0:
757
- step_idx = i // getattr(self.scheduler, "order", 1)
758
- callback(step_idx, t, outputs)
759
-
760
- return self._export(
761
- latents,
762
- output_type,
763
- box_v, mc_level, num_chunks, octree_resolution, mc_algo,
764
- enable_pbar=enable_pbar,
765
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
pipelines.py DELETED
@@ -1,227 +0,0 @@
1
- # Hunyuan 3D is licensed under the TENCENT HUNYUAN NON-COMMERCIAL LICENSE AGREEMENT
2
- # except for the third-party components listed below.
3
- # Hunyuan 3D does not impose any additional limitations beyond what is outlined
4
- # in the repsective licenses of these third-party components.
5
- # Users must comply with all terms and conditions of original licenses of these third-party
6
- # components and must ensure that the usage of the third party components adheres to
7
- # all relevant laws and regulations.
8
-
9
- # For avoidance of doubts, Hunyuan 3D means the large language models and
10
- # their software and algorithms, including trained model weights, parameters (including
11
- # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
12
- # fine-tuning enabling code and other elements of the foregoing made publicly available
13
- # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
14
-
15
-
16
- import logging
17
- import numpy as np
18
- import os
19
- import torch
20
- from PIL import Image
21
- from typing import Union, Optional
22
-
23
- from .differentiable_renderer.mesh_render import MeshRender
24
- from .utils.dehighlight_utils import Light_Shadow_Remover
25
- from .utils.multiview_utils import Multiview_Diffusion_Net
26
- from .utils.imagesuper_utils import Image_Super_Net
27
- from .utils.uv_warp_utils import mesh_uv_wrap
28
-
29
- logger = logging.getLogger(__name__)
30
-
31
-
32
- class Hunyuan3DTexGenConfig:
33
-
34
- def __init__(self, light_remover_ckpt_path, multiview_ckpt_path):
35
- self.device = 'cuda'
36
- self.light_remover_ckpt_path = light_remover_ckpt_path
37
- self.multiview_ckpt_path = multiview_ckpt_path
38
-
39
- self.candidate_camera_azims = [0, 90, 180, 270, 0, 180]
40
- self.candidate_camera_elevs = [0, 0, 0, 0, 90, -90]
41
- self.candidate_view_weights = [1, 0.1, 0.5, 0.1, 0.05, 0.05]
42
-
43
- self.render_size = 2048
44
- self.texture_size = 2048
45
- self.bake_exp = 4
46
- self.merge_method = 'fast'
47
-
48
-
49
- class Hunyuan3DPaintPipeline:
50
- @classmethod
51
- def from_pretrained(cls, model_path):
52
- original_model_path = model_path
53
- if not os.path.exists(model_path):
54
- # try local path
55
- base_dir = os.environ.get('HY3DGEN_MODELS', '~/.cache/hy3dgen')
56
- model_path = os.path.expanduser(os.path.join(base_dir, model_path))
57
-
58
- delight_model_path = os.path.join(model_path, 'hunyuan3d-delight-v2-0')
59
- multiview_model_path = os.path.join(model_path, 'hunyuan3d-paint-v2-0')
60
-
61
- if not os.path.exists(delight_model_path) or not os.path.exists(multiview_model_path):
62
- try:
63
- import huggingface_hub
64
- # download from huggingface
65
- model_path = huggingface_hub.snapshot_download(repo_id=original_model_path,
66
- allow_patterns=["hunyuan3d-delight-v2-0/*"])
67
- model_path = huggingface_hub.snapshot_download(repo_id=original_model_path,
68
- allow_patterns=["hunyuan3d-paint-v2-0/*"])
69
- delight_model_path = os.path.join(model_path, 'hunyuan3d-delight-v2-0')
70
- multiview_model_path = os.path.join(model_path, 'hunyuan3d-paint-v2-0')
71
- return cls(Hunyuan3DTexGenConfig(delight_model_path, multiview_model_path))
72
- except ImportError:
73
- logger.warning(
74
- "You need to install HuggingFace Hub to load models from the hub."
75
- )
76
- raise RuntimeError(f"Model path {model_path} not found")
77
- else:
78
- return cls(Hunyuan3DTexGenConfig(delight_model_path, multiview_model_path))
79
-
80
- raise FileNotFoundError(f"Model path {original_model_path} not found and we could not find it at huggingface")
81
-
82
- def __init__(self, config):
83
- self.config = config
84
- self.models = {}
85
- self.render = MeshRender(
86
- default_resolution=self.config.render_size,
87
- texture_size=self.config.texture_size)
88
-
89
- self.load_models()
90
-
91
- def load_models(self):
92
- # empty cude cache
93
- torch.cuda.empty_cache()
94
- # Load model
95
- self.models['delight_model'] = Light_Shadow_Remover(self.config)
96
- self.models['multiview_model'] = Multiview_Diffusion_Net(self.config)
97
- # self.models['super_model'] = Image_Super_Net(self.config)
98
-
99
- def enable_model_cpu_offload(self, gpu_id: Optional[int] = None, device: Union[torch.device, str] = "cuda"):
100
- self.models['delight_model'].pipeline.enable_model_cpu_offload(gpu_id=gpu_id, device=device)
101
- self.models['multiview_model'].pipeline.enable_model_cpu_offload(gpu_id=gpu_id, device=device)
102
-
103
- def render_normal_multiview(self, camera_elevs, camera_azims, use_abs_coor=True):
104
- normal_maps = []
105
- for elev, azim in zip(camera_elevs, camera_azims):
106
- normal_map = self.render.render_normal(
107
- elev, azim, use_abs_coor=use_abs_coor, return_type='pl')
108
- normal_maps.append(normal_map)
109
-
110
- return normal_maps
111
-
112
- def render_position_multiview(self, camera_elevs, camera_azims):
113
- position_maps = []
114
- for elev, azim in zip(camera_elevs, camera_azims):
115
- position_map = self.render.render_position(
116
- elev, azim, return_type='pl')
117
- position_maps.append(position_map)
118
-
119
- return position_maps
120
-
121
- def bake_from_multiview(self, views, camera_elevs,
122
- camera_azims, view_weights, method='graphcut'):
123
- project_textures, project_weighted_cos_maps = [], []
124
- project_boundary_maps = []
125
- for view, camera_elev, camera_azim, weight in zip(
126
- views, camera_elevs, camera_azims, view_weights):
127
- project_texture, project_cos_map, project_boundary_map = self.render.back_project(
128
- view, camera_elev, camera_azim)
129
- project_cos_map = weight * (project_cos_map ** self.config.bake_exp)
130
- project_textures.append(project_texture)
131
- project_weighted_cos_maps.append(project_cos_map)
132
- project_boundary_maps.append(project_boundary_map)
133
-
134
- if method == 'fast':
135
- texture, ori_trust_map = self.render.fast_bake_texture(
136
- project_textures, project_weighted_cos_maps)
137
- else:
138
- raise f'no method {method}'
139
- return texture, ori_trust_map > 1E-8
140
-
141
- def texture_inpaint(self, texture, mask):
142
-
143
- texture_np = self.render.uv_inpaint(texture, mask)
144
- texture = torch.tensor(texture_np / 255).float().to(texture.device)
145
-
146
- return texture
147
-
148
- def recenter_image(self, image, border_ratio=0.2):
149
- if image.mode == 'RGB':
150
- return image
151
- elif image.mode == 'L':
152
- image = image.convert('RGB')
153
- return image
154
-
155
- alpha_channel = np.array(image)[:, :, 3]
156
- non_zero_indices = np.argwhere(alpha_channel > 0)
157
- if non_zero_indices.size == 0:
158
- raise ValueError("Image is fully transparent")
159
-
160
- min_row, min_col = non_zero_indices.min(axis=0)
161
- max_row, max_col = non_zero_indices.max(axis=0)
162
-
163
- cropped_image = image.crop((min_col, min_row, max_col + 1, max_row + 1))
164
-
165
- width, height = cropped_image.size
166
- border_width = int(width * border_ratio)
167
- border_height = int(height * border_ratio)
168
-
169
- new_width = width + 2 * border_width
170
- new_height = height + 2 * border_height
171
-
172
- square_size = max(new_width, new_height)
173
-
174
- new_image = Image.new('RGBA', (square_size, square_size), (255, 255, 255, 0))
175
-
176
- paste_x = (square_size - new_width) // 2 + border_width
177
- paste_y = (square_size - new_height) // 2 + border_height
178
-
179
- new_image.paste(cropped_image, (paste_x, paste_y))
180
- return new_image
181
-
182
- @torch.no_grad()
183
- def __call__(self, mesh, image):
184
-
185
- if isinstance(image, str):
186
- image_prompt = Image.open(image)
187
- else:
188
- image_prompt = image
189
-
190
- image_prompt = self.recenter_image(image_prompt)
191
-
192
- image_prompt = self.models['delight_model'](image_prompt)
193
-
194
- mesh = mesh_uv_wrap(mesh)
195
-
196
- self.render.load_mesh(mesh)
197
-
198
- selected_camera_elevs, selected_camera_azims, selected_view_weights = \
199
- self.config.candidate_camera_elevs, self.config.candidate_camera_azims, self.config.candidate_view_weights
200
-
201
- normal_maps = self.render_normal_multiview(
202
- selected_camera_elevs, selected_camera_azims, use_abs_coor=True)
203
- position_maps = self.render_position_multiview(
204
- selected_camera_elevs, selected_camera_azims)
205
-
206
- camera_info = [(((azim // 30) + 9) % 12) // {-20: 1, 0: 1, 20: 1, -90: 3, 90: 3}[
207
- elev] + {-20: 0, 0: 12, 20: 24, -90: 36, 90: 40}[elev] for azim, elev in
208
- zip(selected_camera_azims, selected_camera_elevs)]
209
- multiviews = self.models['multiview_model'](image_prompt, normal_maps + position_maps, camera_info)
210
-
211
- for i in range(len(multiviews)):
212
- # multiviews[i] = self.models['super_model'](multiviews[i])
213
- multiviews[i] = multiviews[i].resize(
214
- (self.config.render_size, self.config.render_size))
215
-
216
- texture, mask = self.bake_from_multiview(multiviews,
217
- selected_camera_elevs, selected_camera_azims, selected_view_weights,
218
- method=self.config.merge_method)
219
-
220
- mask_np = (mask.squeeze(-1).cpu().numpy() * 255).astype(np.uint8)
221
-
222
- texture = self.texture_inpaint(texture, mask_np)
223
-
224
- self.render.set_texture(texture)
225
- textured_mesh = self.render.save_mesh()
226
-
227
- return textured_mesh