“I wanted to give back, but I felt there was a dearth of epidemiology and genetic data studies in India.”

Bloomberg Distinguished Professor Nilanjan Chatterjee points to his childhood in India, attending public schools, as a driving force behind his work in applying mathematical statistics and probability theory to public health, cancer, epidemiology, and genetics. He was determined to give back what he could.

“It gives me a certain satisfaction to be able to give back what I got in my early years from India, in terms of getting a fantastic education free of cost,” Chatterjee said. “I don’t take any money from partners in India. I’ve been doing this in my own time, using my endowment fund. That gives me flexibility in how I spend my time and the kind of work I do. That’s one of my big passions.”

Born and raised in India, Dr. Chatterjee received his bachelor’s and master’s degrees in Statistics from the Indian Statistical Institute and his PhD in the U.S. While working at the National Cancer Institute (NCI), he became interested in how he could contribute to his home country with his specific skill set.

“With India being such a big country, with more than 1.2 billion in population, it has all kinds of public health issues. But there are so few public health experts, especially in terms of quantitative researchers, like in data science, statistics, and biostatistics. There was a hole and I wanted to contribute in that area,” he said. “But I didn’t see any opportunities. I wanted my contribution to be meaningful, with high quality studies.”

An opportunity like that came around in 2012.

“I was still at the NCI and my collaborator, Dr. Preetha Rajaraman, became the NCI Global Health Representative in India. She put me in touch with the Tata Memorial Cancer Hospital. That’s how it’s got all started,” he said. “They see cancer patients all over India, with very well-known doctors and very advanced treatment facilities. The group I connected with, led by Dr. Rajesh Dikshit, was interested in doing cancer epidemiologic studies. That excited me, and when I spoke to them, they were planning to do something called a genome-wide association study for gallbladder cancer, which is kind of unique.”

“India is especially suited for this project because gallbladder cancer is very rare in the wrld, but India is one of the few places where it is more common. They had the capacity to get the numbers of cases and controls required for large scale genomic studies, and they could get the samples from their hospitals,” he said. “They needed someone with expertise in biostatistics and statistical genetics to perform quality control and analyze large scale genetic data, and that’s where I came in.”

“If we can do something in India, even small, the actual impact is huge because of the population size.”

Their first study together on gallbladder cancer combed through hundreds of thousands of genetic variants all over the genome to identify a region in the genome associated with the rate of gallbladder cancer. Chatterjee credits this initial discovery, which was published in the highly prestigious journal The Lancet Oncology, for giving the team at the Tata Memorial the confidence in doing similar cancer studies. They have now expanded the gallbladder cancer study to double the sample size and have launched similar studies for oropharyngeal and breast cancers.

“We also wanted to develop risk-stratified approaches for cancer prevention. It’s a widely known concept in the U.S. and in Europe, but in India, such models are still lacking,” Chatterjee said. “I was motivated because of my own research in this area, so our collaboration with Tata Memorial is now developing such a model for female breast cancer, then seeing potential clinical applications.”

Chatterjee is now seeking external funding to develop a public-facing cancer risk prediction tool to which he hopes the public and physicians will be able to input risk-factor information and contribute genetic testing data for various cancers and other diseases. The tool would offer clinical guidance on who should be screened, how early they should be screened, and how often they should be screened. He likens it to a one-stop-shop for individual risk assessment for cancers and chronic diseases.

“Until we have good quality data that can be openly accessible by researchers, we will not be able to solve certain public health problems.”

Chatterjee expects that the future of biostatistics will rely heavily on data sets significantly larger than ever before, with hundreds of thousands of people.

“A lot of the time diseases are caused by a combination of many factors,” he said. “In genetic studies, we found that the risk of most cancers and other chronic diseases are associated with thousands of different genes, each has a small effect on their own but in combination they become potent.”

“For this kind of analysis, we need big datasets, and the statistical and machine learning tools need to be scalable. There’s also lot of thinking needed in terms of study design and quality of data, and this is where Biostatistical training becomes important,” he said. “A lot of my research in the past has focused on developing novel methodology, and some of them really caught on with other people already using those software systems. I hope they will continue to have impact on modern large scale epidemiologic studies.” 

One of the most challenging parts of his work, however, is the quality of the data. He explains that there is a lack of education for the researchers on the ground. They need to understand the importance of the study design.

“That culture is not there always, and the reason being is that there are not a lot of people who have required training in epidemiology and biostatistics. It’s not enough to just collect data. It needs to be accurate data. Applicable data,” he said. “Until there is that capacity, we will continue to have this problem. For example, India has published poor quality data of how many people died from COVID-19 in India. The official number is really an underestimate of what people have estimated from other sources of data.”

“It’s so important to invest in public health and training. It’s as important as investing in defense or in business. Health is the most important treasure for increasing productivity,” he said. “More money needs to go into public health implementation, as well as research. One of the positive things that might come out of the pandemic is that people understand the importance of public health and public health training.”