Design and screening of HepG2 cancer cell line inhibitors from Triterpenoid derivatives of Paramignya Trimera

Uyen Tu Tran; Tat Van Pham; Quang Minh Nguyen

doi:10.32508/stdj.v26iSI.4192

Special Issue

Design and screening of HepG2 cancer cell line inhibitors from Triterpenoid derivatives of Paramignya Trimera

Uyen Tu Tran ^{1, *}

Tat Van Pham ²

Quang Minh Nguyen ³

Faculty of Pharmacy, Ho Chi Minh City University of Technology, 475A Dien Bien Phu Street, Binh Thanh District, Ho Chi Minh City, Viet Nam
Institute of Pharmaceutical Education and Research, Binh Duong University, 504 Binh Duong Avenue, Thu Dau Mot City, Binh Duong, Viet Nam
Faculty of Chemical Engineering, Industrial University of Ho Chi Minh City, 12 Nguyen Van Bao Street, Go Vap District, Ho Chi Minh City, Viet Nam

Correspondence to: Uyen Tu Tran, Faculty of Pharmacy, Ho Chi Minh City University of Technology, 475A Dien Bien Phu Street, Binh Thanh District, Ho Chi Minh City, Viet Nam. Email: tuuyen101000@gmail.com.

Volume & Issue: Vol. 26 No. SI (2023): Special issue: Vietnam International Conference On Genome Biology 2023 Proceedings | Page No.: 1-16 | DOI: 10.32508/stdj.v26iSI.4192

Published: 2024-06-30

Abstract

Currently, Artificial intelligence (AI) is a ubiquitous technology that provides effective support across all fields. The pharmaceutical industry in general and drug production and development, in particular, are enjoying a very good application for the opportunity when in silico models have emerged as powerful platforms for designing new drugs. The aim of this project is to develop new anti-cancer agents by designing novel Triterpenoid derivatives from Paramignya Trimera and predicting their efficacy against the Bcl-2 target receptor. The project used three main in silico models: QSAR_MLR, QSAR_PCR and QSAR_ANN. The models can be used to estimate IC₅₀ values for novel derivatives and Escin extracted from Paramignya Trimera. Finally, the new good-value derivatives were docked to the Bcl-2 receptor to assess responsiveness. As a result, newly designed 196 compounds from the structural framework of Triterpenoid compounds were designed by combined with potential substituents. From there, screening by the rule of Veber identified 138 substances that met the requirement of having the ability to make drugs. Successfully, built QSAR_MLR, QSAR_PCR, QSAR_ANN models with results of statistical values: R² = 0.849, R²_adj = 0.826, Q²_LOO = 0.789 for the QSAR_MLR model; QSAR_PCR model with R²= 0.860, R²_adj = 0.831, Q²_LOO= 0.805, and the QSAR_ANN model with the best results: R²_train = 0.941, R²_test= 0.915, R²_cv= 0.912. The use of models can help predict the effectiveness of newly engineered compounds. In this study, 20 compounds were found to be more efficient than Escin. Molecular docking on the Bcl-2 receptor found T.new7 gave the most potential results with the binding energy E_binding = -7.933 (kcal.mol^-1), RMSD = 1.915 (Å). The research has achieved its goal by finding T.new7, a newly designed compound with better anti-cancer ability than natural Escin.

INTRODUCTION

According to GLOBOCAN, liver cancer is one of the 5 deadliest cancers, with a high number of new cases and deaths each year in 2020. Figure 1 shows that liver cancer has the third highest number of deaths in the world and the highest number of deaths in Vietnam1. Worldwide, liver cancer is the third most common cause of cancer death, accounting for 8.3%, following lung and colorectal cancer. In Vietnam, liver cancer is the leading cause of death, accounting for 20.6% of all deaths. These data indicate that with the current situation of deaths from liver cancer, project implementation is extremely necessary. Liver cancer not only causes hundreds of thousands of deaths annually but also imposes a significant socioeconomic burden. Therefore, the search for liver cancer derivatives is extremely urgent for patients and for humanity in general.

**Figure 1**
Estimated number of deaths by cancer in 2020; World & Vietnam; both sexes; all ages. Source: GLOBOCAN

Liver cancer is a type of cancer that starts in liver cells. The liver is a football-sized organ located in the upper right quadrant of the belly, beneath the diaphragm and above the stomach. The liver can develop several types of cancer. Hepatocellular carcinoma (HCC) is the most common type of liver cancer and is the main type of liver cell (hepatocyte). Cancer that spreads to the liver is more common than cancer that spreads to liver cells. Cancer that develops in another part of the body, such as the colon, lung, or breast, and spreads to the liver is referred to as metastatic cancer rather than liver cancer. This type of cancer is termed by the organ in which it began; for example, metastatic colon cancer describes cancer that begins in the colon and travels to the liver.

Triterpenes are a class of terpenes made up of six isoprene units with the chemical formula CH. These compounds can alternatively be thought of as three terpene units. Triterpenes are produced by animals, plants, and fungi and include squalene, the precursor to all steroids. Triterpenes have a wide range of structures. Almost 200 distinct skeletons have been identified. These skeletons can be roughly classified based on the number of rings present. Pentacyclic structures (5 rings) predominate in general. One of the uses of Triterpenoids in the human body is to help prevent and treat cancer as well as to combat cancer metastasis. According to a 2011 study by Watchtel-Galor and colleagues, the use of triterpenoids in genital mushrooms has anticancer effects in vivo according to animal studies (mouse studies). In addition, the study indicated that the ingredients also contain active substances that help prevent cancer cells from growing in vitro (in the test tube). Thus, Triterpenoids help inhibit many types of cancer cells, such as lung cancer, breast cancer, and skin cancer cells. In addition, cancer metastasis is quite complicated. Cancer cells separate from the primary tumor and begin to move to other parts of the body. From there, small tumors—secondary tumors—form2.

Triterpenoids are of interest because of their anti-inflammatory and analgesic properties, especially in anticancer cell lines, including HepG2 cells. Artificial intelligence facilitates the creation of virtual screening models for derivative compounds. This prospective study was designed to explore a synthetic compound with superior cancer-fighting properties compared to the natural substance found in . The triterpenoid of the oleanolic acid (OA) subgroup, called escin (Figure 3), is extracted (Figure 2) of the family . OA has various benefits, including anti-inflammatory, antiviral, and hypoglycemic effects, and has potential for use against cancer cells. A large number of Triterpenoids are active against various human cancer cell lines, such as HepG2, SMMC-772 (hepatocellular carcinoma), HL-60 (leukemia), A549 (hepatocellular carcinoma), MCF-7 (breast cancer), and SW-480 (colon carcinoma) 3.

**Figure 2**
(a) Image ofthe Paramignya Trimera4; (b) Escin structure fromthe Paramignya Trimera5

Furthermore, oleanolic acid (OA) affects cancer cells via many routes. Increasing Bcl-2 receptor inhibition is a strategy that promotes the proliferation of OA-treated HepG2 cancer cells. As a result, the Bcl-2 receptor was chosen as the target of interest. This research used algorithms to predict novel synthetic chemicals. Chemicals that are more effective at inhibiting HepG2 cells are being found. The topic is focused on developing three models: QSAR, QSAR, and QSAR. Using virtual screening procedures saves time, money, and human resources. This research is likely to yield a result that speeds up compound screening in research and new medicine manufacture compared to experimental trials.

METHODOLOGY

Data mining from experiments

The data collected from the experiments are divided into 2 datasets: the training subset and the external evaluation subset. The two subsets are completely independent datasets. The condition is that the compound has a Triterpenoid framework and was tested on HepG2 cell carcinoma cells with an IC value.

Design of new compounds

Two R1 and R2 sites (Figure 3) in the structural frame were selected for the attachment of substituents via the maximum design method. The binding group, which includes 14 cells labeled T1 to T14, has been shown to have anticancer activity. Therefore, 196 novel compounds were synthesized using the maximum design technique. Multilevel design: This method is used in drug design and helps generate a list of design compounds based on various taxonomic factors and material quantities6. The maximum design limits the possibility of missing significant compounds, thereby providing a complete dataset of possible designs when combining taxonomic elements (2 positions selected on the frame Escin structure) and corresponding materials (14 functional groups T1-14 were selected).

**Figure 3**
Structural framework for designing new compounds

**Figure 4**
The substitution groups used in the design of new molecules 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21

Optimization of the structures

New derivatives were created using the ChemDraw program. Using molecular and quantum mechanics, all novel and experimental derivatives have been structurally optimized. These two types of software used include HyperChem with an MM+ force field and a gradient level of 0.05 and MOPAC with the semiempirical PM7 approach. This approach helps molecules determine the most stable structure and acquire descriptive variables, including partial charge, HOMO, LUMO, MW, DH, and so on.

Calculation of molecular descriptors

After the structural optimization procedure, MOE software was used to determine the molecular descriptor for all the datasets. The selection and calculation of all descriptors were investigated from 0 to 3D. When the results are made public. The variable screening procedure was used to exclude variables that were not important. When combined with the descriptor variables produced from the structural optimization process, a dataset of descriptive characteristics for each molecule is obtained, allowing QSAR models to be built.

Estimation of QSAR models

This study focused on the development of three QSAR models: multivariate linear regression (MLR), principal component regression (PCR), and artificial neural network (ANN) models.

QSAR model

The QSAR model predicts the dependent variable Y based on the values of two or more independent variables X. The model is represented as follows:

Y = + × + × + … + × + e (1)

Here, Y is the dependent variable, and β, β, β… β are the regression parameters of the model. X is the independent variable (k is the number of variables), and e is the random error. In this study, the dependent variable was the IC value. The independent variable is the molecular descriptor22. Regression 200823 software was used to construct the QSAR model.

QSAR model

The set {X,Y}, where X is a data group with m observations and n variables and Y is the dependent variable. The information is gathered but not previously processed. Although outcome Y has no direct association with X, it does have a relationship with the principal components, which is a property of principal component regression22. To create the QSAR model, the XLSTAT 201624 program was used.

QSAR model

Artificial neural networks (ANNs) perform the same learning process as the human brain22. The structure of an artificial neural network I()-HL()-O() includes the following: the input layer I() is the descriptive variable of the QSAR model, the output layer O() is the IC value, and the hidden layer HL() is investigated to determine the best QSAR model25. The QSAR model was trained on the MATLAB 201626 tool.

Drug-likeness

The rule of action for Lipinski-5, the earliest and most well-known rule for identifying substances with good oral absorption, was proposed in 199727. Since then, several analogous rules based on molecular characteristics, such as those given by Ghose28 and Veber29, have been established. According to Veber's rule, substances in this study must meet the following two criteria: rotatable bonds (nRB) ≤ 10 and polar surface area (tPSA) ≤ 140 Å. Therefore, screening according to the same rules is aimed at finding compounds that have the potential to become more effective oral drugs according to Veber's rule under two conditions: rotatable bonds (nRB) ≤ 10 and polar surface area (tPSA) ≤ 140 Å. Reduced molecular flexibility, as measured by the number of rotatable bonds and low polar surface area or total hydrogen bond count (sum of donors and acceptors), are important predictors of good oral bioavailability. A reduced polar surface area correlates better with an increased permeation rate than does lipophilicity (C log P), and an increased rotatable bond count has a negative effect on the permeation rate29.

Bioactivity prediction

Medicinal characteristics were determined by using three QSAR models to predict the bioactivity at the IC for new synthetic compounds, and esin is a natural triterpenoid derived from . Then, using Escin as a reference, we looked for derivatives with greater biological activity than Escin. Currently, research predicts and discovers derivatives with higher bioavailability than natural chemicals and the potential to become medications.

Molecular Docking

The main goal of molecular docking is to understand and predict molecular recognition both in terms of structure-finding bonds and energy-predicting affinity. Currently, the application of molecular docking methods is very diverse and includes structure-activity studies, optimization, and potential molecule searches via virtual screening30. In this study, we used the MOE2019 package to perform the molecular docking process.

Escin, a Triterpenoid derived from , belongs to the oleanolic acid group. This process affects cancer cells in a variety of ways, including inducing cyclic death, controlling the cell cycle, and killing cancer cells. Bcl-2 normally prevents cell cycle death (apoptosis). The target of action in this investigation was chosen to be Bcl-2, which inhibits Bcl-2 receptors, hence boosting cancer cell cyclic death31. The Bcl-2 receptor, encoded 4D2M, was obtained from the Protein Data Bank (PDB)32.

RESULTS

The training and test datasets

Seventy-four chemicals were gathered from articles published in reputable journals and PubMed. The data were utilized to develop models and assess the external inhibitory concentration (IC. The training set of 60 compounds was used to develop QSAR models, and the external validation set of 14 compounds was utilized to assess the predictive power of the biological activity of the model.

**Figure 5**
Summary of the 60 experimental IC50 values in the training datasets 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47

Design of the new compound

Using the multilayer design method and ChemDraw tool, a total of 196 novel molecules were obtained. All of these derivatives were optimized using the proper molecular mechanics sequence, followed by PM7 quantum mechanics. They are then calculated descriptors in the following phase.

Optimization of the structure and calculation of descriptors

All the compounds, both experimental and newly created, were subjected to structural optimization and molecular descriptor computations. The findings generated 310 descriptive attributes for each molecule utilized to construct the QSAR model.

Construction of models

Table 1

Results of building the QSAR_MLR model

k	Variables	R2	R2adj	Q2LOO	SE	Fstat	PRESS
1	x₁	0.250	0.233	0.192	4.139	18.890	1064.026
2	x₁ \| x₂	0.626	0.613	0.569	2.938	47.792	568.351
3	x₁\| x₂ \| x₃	0.710	0.694	0.651	2.613	45.626	460.183
4	x₁\| x₂ \| x₃ \| x₄	0.756	0.738	0.693	2.418	42.577	403.963
5	x₁\| x₂ \| x₃ \| x₄ \| x₅	0.785	0.765	0.720	2.291	39.363	368.313
6	x₁ \| x₂ \| x₃ \| x₄ \| x₅ \| x₆	0.814	0.793	0.752	2.150	38.628	326.307
7	x₁ \| x₂ \| x₃ \| x₄ \| x₅ \| x₆ \| x₇	0.829	0.805	0.764	2.084	35.894	310.581
8	x₁ \| x₂ \| x₃ \| x₄ \| x₅ \| x₆ \| x₇ \| x₈	0.849	0.826	0.789	1.972	35.944	277.383
Notation of molecular descriptors
LUMO	Lowest unoccupied molecular orbital	x₁	vsurf_CW4		Capacity factor at -2.0		x5
PEOE_RPC-	Relative negative partial charge	x₂	SlogP_VSA3		Bin 3 SlogP (0.00, 0.10]		x6
²¹C	Partial charge of C position number 21	x₃	vsurf_EWmin1		Lowest hydrophilic energy		x7
vsurf_DD13	vsurf_EDmin1, vsurf_EDmin3 distance	x₄	SlogP_VSA4		Bin 4 SlogP (0.10, 0.15]		x8

The model yields the following equation: R = 0.849, R = 0.826, and Q = 0.789:

IC (µM) = 3.739 + 2.247× x + 85.94× x + 24.36× x (3.13)+ 0.156×x + 5.24× x – 0.03695× x + 0.933× x - 0.131× x (2)

Construction of the QSAR model

The QSAR model was built based on the variables of the QSAR model and yielded the following results: R = 0.860, R = 0.831, and Q = 0.805. The equations are represented as follows:

IC (µM) = 8.406 + 2.48×x + 73.319×x + 25.836×x + 0.135×x + 3.965×x - 0.037×x + 1.250×x - 0.130×x (3)

Construction of the QSAR Models

The QSAR model was built using the QSAR-MLR model descriptors from equation (2). The training of models uses a back-propagation algorithm with transfer functions such as Logsig, Tansig and Purelin. Therefore, the architecture of the ANN models in this scenario is I(8)-HL()-O(1). QSAR models were developed in two stages. First, using the training dataset, multiple designs of MLP networks with different values were identified, and the results are shown in Table 2.

Table 2

The results of initial screening for the ANN architecture

Ord.	QSARANN model	Transfer function	R2train	R2test	R2cv	Training error	Test error	Validation error	Training algorithm
1	I(8)-HL(6)-O(1)	Logsig	0.986	0.988	0.988	1.201	3.367	0.983	BFGS 42
2	I(8)-HL (6)-O(1)	Tansig	0.957	0.984	0.998	0.961	2.739	1.186	BFGS 63
3	I(8)-HL(10)-O(1)	Logsig	0.965	0.987	0.97	0.948	2.084	1.752	BFGS 39
4	I(8)-HL(10)-O(1)	Tansig	0.975	0.992	0.99	2.249	2.589	0.721	BFGS 13
5	I(8)-HL(5)-O(1)	Purelin	0.907	0.967	0.973	2.790	2.246	1.003	BFGS 17
6	I(8)-HL(6)-O(1)	Purelin	0.941	0.916	0.912	0.822	1.389	0.795	BFGS 36
7	I(8)-HL(10)-O(1)	Purelin	0.911	0.921	0.975	1.658	2.326	1.896	BFGS 43
8	I(8)-HL(4)-O(1)	Purelin	0.917	0.977	0.936	2.567	2.238	1.326	BFGS 69

Second, using the same external evaluation dataset (Table 5) of the QSAR and QSAR models, the best network was chosen based on the MARE (%) and Q values. As a result, the best model was I(6)-HL(6)-O(1) (Figure 5a), with the statistical parameters Q = 0.866 and MARE = 62.1%, as shown in Figure 5 and Figure 12, respectively, with the Purelin transfer function.

**Figure 6**
The architecture of the QSPR_ANN I(8)-HL(6)-O(1) model

The external validation

External evaluation is considered a test to validate the predictive ability of the built models. From there, the model that gives the closest prediction results to the experimental results is selected. The external evaluation dataset includes 14 compounds obtained from experiments and is an independent set from the set used to construct the QSAR model. Detailed information on these derivatives is presented in Figure 7.

**Figure 7**
The experimental values of IC_50,exp in the external evaluation dataset

The predicted values of the QSAR models for 14 substances in the external evaluation dataset are presented in Table 3.

Table 3

The predicted IC_50,pred_valuesof the three models in the external evaluation set

Symbol	IC50,exp (µM)	IC50,pred (µM)
		QSARMLR	QSARPCR	QSARANN6
TPN1	8.900	5.777	6.196	5.311
TPN2	29.700	10.388	10.573	11.069
TPN3	3.300	3.754	3.613	2.562
TPN4	17.660	6.408	6.462	7.848
TPN5	11.600	5.995	6.276	4.827
TPN6	15.500	8.868	9.065	7.713
TPN7	17.700	7.547	7.519	7.264
TPN8	22.400	8.867	9.065	7.714
TPN9	25.000	9.353	10.069	8.772
TPN10	23.700	7.547	7.519	7.264
TPN11	22.300	8.485	9.184	7.403
TPN12	21.000	10.116	10.320	10.313
TPN13	26.700	9.860	10.266	10.054
TPN14	0.660	3.750	4.015	1.5943

Drug-likeness

After screening the drug likeness of all 196 compounds using Veber's criteria, we found 138 compounds that met these criteria.

Bioactivity prediction

The study predicts the IC for 138 newly designed substances and Escin. Then, 20 compounds with less than Escin were obtained, ordered from small to large based on QSAR for the best predictability. The structures of the 20 potential substances are presented in Figure 8.

**Figure 8**
The structures of the 20 new compounds

The detailed prediction results of 20 compounds from each model are presented in Table 4.

Table 4

The predicted IC_50,pred values of new derivatives from three QSAR models

Symbol	IC₅₀, pre (µM)			Symbol	IC₅₀, pre (µM)
	QSAR_MLR	QSAR_PCR	QSAR_ANN		QSAR_MLR	QSAR_PCR	QSAR_ANN
T.new1	1.754	2.15	2.675	T.new11	4.025	4.277	1.817
T.new2	2.848	2.7	1.532	T.new12	5.571	5.589	2.502
T.new3	2.905	4.159	1.477	T.new13	4.11	4.786	1.672
T.new4	0.807	0.973	2.094	T.new14	3.906	4.175	1.955
T.new5	5.013	5.616	2.254	T.new15	5.042	5.325	2.201
T.new6	4.48	4.687	2.056	T.new16	1.229	2.004	2.279
T.new7	0.938	1.764	1.187	T.new17	5.299	5.871	2.686
T.new8	6.115	6.803	2.719	T.new18	2.863	3.642	1.573
T.new9	2.369	3.11	1.496	T.new19	1.818	2.746	1.306
T.new10	6.277	7.156	1.356	T.new20	3.685	4.609	1.957
				Escin	3.391	3.534	2.752

Molecular Docking

To test the inhibitory ability of the peptides on HepG2 cancer cells, 20 drugs were docked with the appropriate IC50 values for the Bcl-2 receptor. This docking process helps evaluate the binding ability of the compound to the Bcl-2 target receptor by simulating the 3D structure of both the receptor and the compound. Substances that are considered well bound have an RMSD < 2.0 Å and an E_binding < -7.0 kcal.mol. The results for the 6 compounds with good binding energies and RMSD values are presented in Figure 5.

**Figure 9**
Docking results of six compounds with the 3U6J–Bcl-2 system

DISCUSSION

QSAR models

Table 1 and Figure 10 show that the R, R, and Q values are proportional to the number of variables. This shows that when the number of variables increases, the model improves. This change was accompanied by a significant change from 7 to 8 variables, although before each increase, the variables did not change much. Therefore, 8 variables are needed, indicating that this is the most promising QSAR model. The model consists of 8 variables as follows: LUMO, PEOE_RPC- C vsurf_DD13 vsurf_CW4 SlogP_VSA3 vsurf_EWmin1 and SlogP_VSA4. LUMO is the lowest unoccupied molecular orbital; PEOE_RPC- is a relatively negative partial charge; 21C is the partial charge at position 21; vsurf_DD13 is the vsurf_EDmin1 – vsurf_EDmin3 distance; vsurf_CW4 is the capacity factor at -2.0; logP_VSA3 is the bin 3 SlogP (0.00, 0.10]; vsurf_EWmin1 is the lowest hydrophilic energy; and SlogP_VSA4 is the bin 4 SlogP (0.10, 0.15].

**Figure 10**
The variation in the SE, R²_train, and Q²_LOO values in response to the k value

The QSAR model results for R = 0.849 > 0.6(47) showed that the model encoded 84.9% of the biological activity variables in the dataset. An R = 0.826 > 0.5 represents an encoding of 82.6% of the active value variable in the data, and Q = 0.789 > 0.5(47). As a result of these findings, the model produced relatively good prediction outcomes.

Moreover, the QSAR model results for R = 0.860 > 0.6(47) demonstrated that the model encoded 86% of the biological activity variables in the dataset. An R = 0.831 > 0.5 represents an encoding of 83.1% of the variable to the active value in the data. Q = 0.846 > 0.5(47). Based on these findings, the model produces accurate predictions.

The architecture of ANN I(8)-HL(6)-O(1) using the Purelin transfer function for R = 0.941, R = 0.916 and Q = 0.912 shows that the model has good predictability with high correlation values. The results with an external evaluation set of 0.866 show that the predictive ability of this model is closest to reality.

Based on the above reasons, the QSAR models were chosen to develop the new design and Escin.

The contributions of the variables in the model were also investigated, and the results are presented in Figure 11. All the descriptors contributed significantly to various degrees the most significant contributor was PEOE_RPC-, and the least significant contributor was vsurf_EWmin1, with contributions of 42.3% and 1.6%, respectively. The remaining variables also contributed to the QSAR model in the following order: PEOE_RPC- > vsurf_CW4 > SlogP_VSA3 > LUMO > 21C > vsurf_DD13 > SlogP_VSA4 > vsurf_EWmin1.

**Figure 11**
The contributions of the variables to the QSAR_MLR model

The external validation

As mentioned above, external evaluation is used to construct the MLR and PCR models. In addition, the best ANN model was identified from the initial survey models, as shown in Table 2. The two values used as a basis for evaluation are Q (>0.5) and MARE (%). The results are fully presented in Figure 12. The results show that the linear regression models meet the requirements, and the 6 neural network model (ANN6) is selected for the project because the Q value is 0.866, which is the highest, while the MARE (%) value is comparable to that of the other models.

**Figure 12**
The MARE (%) and Q²_EX values of the QSAR models

As depicted in Figure 13, the Qof the QSAR model for the relationships between the IC and experimental IC values are shown, for a value of 0.840. Similar to the QSAR and the external evaluation set, the result is Q = 0.846, and the QSAR gives a result of 0.866. The conclusion that the above three models all give good correlation index results for the external evaluation set shows that the evaluation ability is reliable and can be used to predict a wide range of design substances.

Furthermore, one-way ANOVA showed that the differences between the experimental and predicted values from the three models, QSAR, QSAR, and QSAR, were not significant when the results were F = 0.0269 < F = 3.2381. Therefore, the predictive ability of the three models is appropriate.

**Figure 13**
Correlations of experimental and predicted values on the external dataset of QSAR models

Bioactivity prediction

Under the same calculation conditions, the same models predicting the results obtained above for the 20 substances had better IC values than did those of Escin. The present study used Escin as a base to select compounds with better biological activity to prove that this potential new substance has superior cancer cell inhibitory ability compared to natural active substances.

Prediction results of new molecules and predicted Escin values from three QSAR models, QSAR, QSAR, and QSAR. There was no significant difference in the analysis of variance (F = 0.71595 < F = 3.07606). Therefore, the predictive ability of the three models is consistent and reliable.

Molecular Docking

The full interaction results of the six new compounds on the Bcl-2 receptor are presented in Table 5. Among the six compounds that gave good results

T.new1 binds to the Bcl-2 receptor via a hydrogen acceptor bond to ARG74 (distance = 2.97 Å, energy = -1.7 kcalmol), E_binding = -7.158 (kcalmol) and RMSD = 1.539 (Å).

T.new4 binds to the Bcl-2 receptor via a pi-cation bond to ARG154 (distance = 3.58 Å, energy = -1.6 kcalmol), E_binding = -7.817 (kcalmol) and RMSD = 1.696 (Å).

T.new7 binds to the Bcl-2 receptor via a hydrogen donor bond to CYS174 (distance = 3.37 Å, energy = -1.2 kcalmol), E_binding = -7.933 (kcalmol) and RMSD = 1.915 (Å).

T.new11 binds to the Bcl-2 receptor via a pi-cation bond to TYR79 (distance = 3.78 Å, energy = -0.9 kcalmol), E_binding = -7.166 (kcalmol) and RMSD = 1.388 (Å).

T.new12 binds to the Bcl-2 receptor via a hydrogen donor bond to CYS174 (distance = 3.23 Å, energy = -0.8 kcalmol), E_binding = -7.869 (kcalmol) and RMSD = 1.279 (Å).

T.new19 binds to the Bcl-2 receptor via a pi-cation bond to ILE146 (distance = 3.65 Å, energy = -0.9 kcalmol), E_binding = -7.367 (kcalmol) and RMSD = 1.846 (Å).

The full interaction results of the six new compounds on the Bcl-2 receptor are presented in Table 5. Among the six compounds that gave good results, T.new7 had the best results: T.new7 binds to the Bcl-2 receptor by a hydrogen donor bond to CYS174 (distance = 3.37 Å, energy = -1.2 kcalmol), E_binding = -7.933 (kcalmol) and RMSD = 1.915 (Å).

Table 5

Detailed interaction results of new compounds on the Bcl-2 receptor

Compounds	Ligand	Receptor				Interaction	Distance (Å)	E (kcal/mol)
T.new1	O 58	NE	ARG	74	(A)	H-acceptor	2.94	-1.7
T.new4	5-ring	NH2	ARG	154	(A)	pi-cation	3.58	-1.6
T.new7	O 20	SG	CYS	174	(A)	H-donor	3.37	-1.2
T.new11	6-ring	CD1	TYR	79	(A)	pi-H	3.78	-0.9
T.new12	O 48	SG	CYS	174	(A)	H-donor	3.23	-0.8
T.new19	5-ring	CG2	ILE	146	(A)	pi-H	3.65	-0.9

T.new7 had the best results: T.new7 binds to the Bcl-2 receptor by a hydrogen donor bond to CYS174 (distance = 3.37 Å, energy = -1.2 kcalmol) E_binding = -7.933 (kcalmol) and RMSD = 1.915 (Å).

On the BCL-2 receptor, the amino acids considered essential are ARG74, ARG154, CYS174, TYR79, CYS174, and ILE146 when sequentially linked by the 6 most potential compounds.

Since then, QSAR, QSAR, and QSAR models have been successfully constructed to predict new engineered substances. Finally, the T.new7 compound was selected to inhibit HepG2 cancer cells.

CONCLUSION

This study applied the QSAR model to screen and develop new drugs, specifically triterpenoids, for use on HepG2 cancer cells. The final selected compound T.new7 showed better bioavailability than the naturally occurring substance and met the conditions for use as a drug according to Veber's rule. Biological activity prediction based on the screening process and statistical statistics is fair and reliable. This study provides the foundation for future T.new7 experimental studies. Based on these findings, T.new7 was created using the Escin structural framework, with R as morpholine and R as coumarin. T.new7 has an IC (µM) of 0.938, 1.764, and 1.187 according to the 3 models QSARMLR, QSARPCR, and QSARANN, respectively. In this case, QSAR produces the best prediction results, and all three values are lower than those of the natural parent chemical Escin. The results of molecular docking showed that E_binding = -7.933 kcal.mol and RMSD = 1.915 Å. The T.new7 compound binds to the Bcl-2 receptor via an H-donor. Specifically, T.new7 gives amino acid CYS174 a hydrogen (distance d = 3.37 Å, energy = -1.2 kcal.mol). Therefore, T.new7 was selected as the best inhibitor of HepG2 cancer cells. This current study is limited by the fact that it involved only virtual screening. Despite its exploratory nature, this study offers T.new7 for further experiments. Consequently, further experimental studies are necessary to confirm the effectiveness of T.new7 against HepG2 cancer cells.

ABBREVIATIONS

ANOVA: Analysis of variance

ANN: Artificial neural network

Bcl-2: B-cell lymphoma 2

GLOBOCAN: Global Cancer Statistics

HepG2: Human liver cancer cell line

HOMO: Highest occupied molecular orbital

IC: Half-maximal inhibitory concentration

LUMO: lowest unoccupied molecular orbital

MLR: Multiple linear regression

OA: Oleanolic Acid

PCR: Principal component regression

PDB: Protein Data Bank

PM7: Parameterized Model 7

QSAR: Quantitative structure-activity relationship

COMPETING INTERESTS

AUTHOR CONTRIBUTIONS

All authors participated in study design, coordination, and manuscript drafting

Design and screening of HepG2 cancer cell line inhibitors from Triterpenoid derivatives of Paramignya Trimera

Abstract

INTRODUCTION

METHODOLOGY

Data mining from experiments

Design of new compounds

Optimization of the structures

Calculation of molecular descriptors

Estimation of QSAR models

QSAR model

QSAR model

QSAR model

Drug-likeness

Bioactivity prediction

Molecular Docking

RESULTS

The training and test datasets

Design of the new compound

Optimization of the structure and calculation of descriptors

Construction of models

Construction of the QSAR model

Construction of the QSAR Models

The external validation

Drug-likeness

Bioactivity prediction

Molecular Docking

DISCUSSION

QSAR models

The external validation

Bioactivity prediction

Molecular Docking

CONCLUSION

ABBREVIATIONS

COMPETING INTERESTS

AUTHOR CONTRIBUTIONS

FUNDING

Comments