Nowadays, machine learning has become an integral part of various industries such as finance, healthcare, software, and data science. However, to develop a good and working ML model, setting up the necessary environments and tools is essential, and sometimes it may create many problems as well. Now, imagine training models like XGBoost directly in your browser without any complex setups and installations. This not only simplifies the process but also makes machine learning more accessible to everyone. In this article, we’ll go over what Browser-Based XGBoost is and how to use it to train models on our browsers.
What is XGBoost?
Extreme Gradient Boosting, or XGBoost in short, is a scalable and efficient implementation of the gradient boosting technique designed for speed, performance, and scalability. It is a type of ensemble technique that combines multiple weak learners to make predictions, with each learner building on the previous one to correct errors.
How does it work?
XGBoost is an ensemble technique that utilizes decision trees, base or weak learners, and employs regularization techniques to enhance model generalization. This also helps in reducing the chances of the model overfitting. The trees (base learners) use a sequential approach so that each subsequent tree tries to minimize the errors of the previous tree. So, each tree learns from the errors of the previous tree, and the next one is trained on the updated residuals from the previous.
This attempts to help correct the errors of the previous ones by optimizing the loss function. That’s how the progressively the model’s performance will progressively improve with each iteration. The key features of XGBoost include:
- Regularization
- Tree Pruning
- Parallel Processing
How to Train in the Browser?
We will be using TrainXGB to train our XGBoost model completely on the browser. For that, we’ll be using the house price prediction dataset from Kaggle. In this section, I’ll guide you through each step of the browser model training, selecting the appropriate hyperparameters, and evaluating the inference of the trained model, all using the price prediction dataset.

Understanding the Data
Now let’s begin by uploading the dataset. So, click on Choose file and select your dataset on which you want to train your model. The application allows you to select a CSV separator to avoid any errors. Open your CSV file, check how the features or columns are separated, and select the one. Otherwise, it will show an error if you select some different.
After checking how the features of your dataset are related to each other, just click on the “Show Dataset Description”. It will give us a quick summary of the important statistics from the numeric columns of the dataset. It gives values like mean, standard deviation (which shows the spread of data), the minimum and maximum values, and the 25th, 50th, and 75th percentiles. If you click on it, it will execute the describe method.

Selecting the Features for Train-Test Split
Once you have uploaded the data successfully, click on the Configuration button, and it will take you to the next step where we’ll be selecting the important features for training and the target feature (the thing that we want our model will predict). For this dataset, it is “Price,” so we’ll select that.

Setting up the Hyperparameters
After that, the next thing is to select the model type, whether it is a classifier or a regressor. This is completely dependent on the dataset that you have chosen. Check whether your target column has continuous values or discrete values. If it has discrete values, then it is a classification problem, and if the column contains continuous values, then it is a regression problem.
Based on the selected model type, we’ll also select the evaluation metric, which will help to minimize the loss. In my case, I have to predict the prices of the houses, so it is a continuous problem, and therefore, I have selected the regressor for the lowest RMSE.
Also, we can control how our XGBoost trees will grow by selecting the hyperparameters. These hyperparameters include:
- Tree Method: In the tree method, we can select hist, auto, exact, approx, and gpu_hist. I have used hist as it is faster and more efficient when we have large datasets.
- Max Depth: This sets the maximum depth of each decision tree. A high number means that the tree can learn more complex patterns, but don’t set a very high number as it can lead to overfitting.
- Number of Trees: By default, it is set at 100. It signifies the number of trees used to train our model. More trees ideally improve the model’s performance, but also make the training slower.
- Subsample: It is the fraction of the training data fed to each tree. If it is 1 means all the rows, so better to keep a lower value to reduce the chances of overfitting.
- Eta: Stands for learning rate, it controls how much the model learns at each step. A lower value means slower and accurate.
- Colsample_bytree/bylevel/bynode: These parameters help in selecting columns randomly while growing the tree. Lower value introduces randomness and helps in preventing overfitting.

Train the Model
After setting up the hyperparameters, the next step is to train the model, and to do that, go to Training & Results and click on Train XGBoost, and training will start.

It also shows a real-time graph so that you can monitor the progress of the model training in real time.

Once the training is complete, you can download the trained weights and use them later locally. It also shows the features that helped the most in the training process in a bar chart.

Checking the Model’s Performance on the Test Data
Now we have our model trained and fine-tuned on the data. So, let’s try the test data to see the model’s performance. For that, upload the test data and select the target column.

Now, click on Run inference to see the model’s performance over the test data.

Conclusion
In the past, building machine learning models required setting up environments and writing code manually. But now, tools like TrainXGB are changing that completely. Here, we don’t need to write even a single line of code as everything runs inside the browser. Platforms like TrainXGB make it as simple as we can upload real datasets, set the hyperparameters, and evaluate the model’s performance. This shift towards browser-based machine learning allows more people to learn and test without worrying about setup. However, it is limited to some models only, but in the future, new platforms may come with more powerful algorithms and features.
Login to continue reading and enjoy expert-curated content.