We had over 1000 attendees during our very first webinar series. Data Science for Finance was a success – partly because we have received so many interesting questions before, during, and after the webinar sessions. Please find a selection of participant questions and our answers below. You can watch a recording of our Data Science for Finance webinar here.
During the webinar, you mentioned that traditional investment managers and quant firms are using data to craft investment management strategies. How exactly are they doing this if this is new and they don’t have data to work with?
This is an interesting question. They do have data in the investment management industry – in fact a great deal of data. The point we were making is that the level of data analytics on past data has been limited thus it is an underutilized investing asset. We are saying that investment managers, in today’s world, are starting to see the predictive power of data and are working hard to integrate it into current portfolio strategies. Also, keep in mind that in 2008, the UK set up Auto-Enrollment as part of the UK Pensions Act. The purpose of this legislation is to get workers to start investing in preparation for their retirements. As part of this scheme, the AE started collecting heaps of data that can be used by investment managers in construction of future portfolios as the data ranges from custodian to specific sentiment data on the investors themselves.
In your experience, what are the biggest hurdles for financial companies when it comes to big data?
Many (especially large) financial companies deal with scattered data lying in silos across various teams. One of the main obstacles is to remove such data silos and dependency on legacy systems to integrate information from different data sources more easily. In addition, analytics teams do not seem to be structured very efficiently within organizations. It has been shown that centralized data analytics teams and/or business units are significantly more effective than scattered pockets of teams or decentralized units spread across the organization. Lastly, organizations should further optimize internal workflows to support data-driven decision making, managers need to understand/study key concepts around data analytics, and the right analytical talent needs to be recruited and retained.
Do you view Data-Driven Decision Making as being adversarial or complementary to experiences and intuition? What is the role of hunches and/or creativity (envisioning something totally new) in the decision making process?
Data-Driven Decision Making (DDD) can supplement experience and intuition with objective information. Used together with experience and intuition, DDD helps decision makers become “data-informed”.
Hunches will always play a role in decision-making because models do not include every exogenous factor and some decisions need to account for “qualitative elements”. So, while a model might help illuminate a “better” decision based purely on quantitative features, is it the best decision for all stakeholders? It really depends. Humans possess the ability to reason and “see the bigger” picture. Even as artificial intelligence and machine learning continue to become more powerful and play more important roles in future technologies, it is unlikely that all decisions can be made purely by computer programs and algorithms in the near future. Experiences and intuition will always play some role in the decision-making process – to account for the factors these programs and algorithms are not taking into consideration or cannot be quantified.
Creativity is also a core component of decision-making because it is the catalyst for solution generation. Many times complex business problems need an “out of the box” solution that is more easily generated by creative approaches. This is why many corporations have retreats. Getting away from the office and brainstorming in new and relaxing environments can get the creative juices going. Thus, creativity is an essential part of the decision making process now and will continue to be in the future.
How do you think companies will handle issues around data privacy?
Data privacy is both an internal and external problem for corporations. Internally, many different teams want to use the same datasets because they can provide different insights that support various business functions. In the past, the privacy was less protected because there were only few technologies in place to limit access to these datasets. Now, data management systems are more sophisticated to give access to certain teams and individuals. This can be further supported by data mapping and having a centralized system managed by a Chief Data/Information Officer. By having a “gate keeper”, only those that need certain data types can obtain access thus preventing internal data privacy breaches.
Externally, corporations collect data from various mediums (e.g., websites, online applications, credit card purchases). This data is often used to assess what additional products and services to offer customers. It will be hard to prevent corporations from continuing this practice as the data is often necessary to complete purchases. However, governments are starting to impose stricter data privacy laws to ensure that consumer data are better protected.
Will this have the outcome that consumers want? Time will tell. But, it is important to realize that more powerful technology is a double-edged sword. With more powerful technologies comes increased data collection capabilities and responsibilities. It’s clearly a predicament.
What are the pros and cons of Python and R with regards to data analytics? From the perspective of somebody with no/minimal programming background, which would be better to learn first?
Both Python and R are amongst the most popular languages for data analysis, and both have their supporters and opponents. Python is a general-purpose programming language which means that, in addition to performing data analyses, you can use it to automate tasks with scripts or build powerful web applications. R has been designed for statisticians so its core functionalities focus on user friendly statistics. Both programming languages feature powerful libraries that extend their core capabilities and enable deep analytics with only little time investment.
Python code is known for readability and simplicity which makes it enjoyable to learn. Our course participants have told us that they find the learning curve relatively low and gradual, even without prior programming experience. R’s initial learning curve is known to be a bit steeper.
What are salary ranges for an experienced finance professional that adds solid data skills?
This is a good question which depends on several factors, including but not limited to relevant qualifications, geographic location, and the company you want to work for. This discussion may provide some ideas.
Are there any books you recommend about the field of data science?
Books that we have enjoyed reading include:
Python for Data Analysis by Wes McKinney
Data Science from Scratch by Joel Grus
The Signal and and the Noise by Nate Silver
Data Smart: Using Data Science to Transform Information into Insight by John W. Foreman
Big data: The next frontier for innovation, competition, and productivity by McKinsey (report)
In addition to these, we recommend our Cognitorials, short videos that have been designed for busy professionals and students who want efficient ways to learn complex data science concepts.
On the last slide, you said that there will be a shortage of talent that is prepared to work with data. What can be done to ensure you are prepared to take advantage of job opportunities in the future?
Keep learning! We believe that individuals who want to seek such opportunities have to understand the possibilities of data analytics in a business/organizational context. This includes obtaining basic knowledge of computer programming principles and an understanding of how computer programming can be used to perform effective data analyses.
We are living in a knowledge world. Those who advance quickly do so because they are lifelong learners. It is important to never remain idle or “too comfortable.” Because technology is fundamentally changing many traditional roles, it is critical to recognize that technology will replace certain tasks and that roles will require new skill sets. Currently, data analytics skills are prized in the corporate world because of the dearth of talent. However, salaries are often a function of supply and demand which means that this dearth will go away and salaries will even out.
But, new technologies will emerge and professionals will need to learn skills to capitalize on them. So, the ability and desire to quickly learn is critical to remaining competitive now and in the future.
How can I launch a data-driven career in FinTech starting from scratch?
The key thing here is to develop knowledge in two areas: data science and finance. We believe the best way to build your data science knowledge base is to first take an intensive data science bootcamp (1-3 day options are best in our opinion) and then practice the skills on relevant datasets. The reason for this is because most learning will not happen in the bootcamp itself. It will happen when you practice applying the taught concepts on various datasets and problems. The age old adage – practice makes perfect – is completely true even for building data science and most other technical skills .
Taking into consideration time and monetary investment, we believe that CFA or CAIA programs are the best ways to quickly and economically build finance knowledge bases. The knowledge obtain should be ample for most data-driven FinTech careers.
What are the best places to start learning more about data science?
How can organisations such as banks access data from social media?
All major social media platforms offer something called Application Programming Interfaces (APIs). Such APIs can be used to access data from social media using different programming languages (e.g., Python, Java, R). Think of them as bidirectional communication channels. For example, you can use the Twitter API to request tweets that contain a specific hashtag. Twitter will then return this data to you in a structured format which you can use for further processing.
Do you know successful companies that make scoring models for SME lending in Europe or US (or maybe Asia) based on big data? What kind of data/how many inputs they use?
Sure! Below are just two companies:
There are quite a few players in the P2P and SME lending spaces. They all have different credit scoring algorithms with various amounts and types of data points. Ultimately, these data points are the foundation of each business’ IP. So, detailed breakdowns of these algorithms will generally not be made public. But, you can count on social media, employment history, academic history data points to be part of these algorithms.
How useful do you think the skills that you taught us on the webinar will be to a student getting a full-time MBA?
Most of our MBA course participants find that obtaining these skills is extremely valuable for their careers.
First, MBA students can improve their business problem solving and creativity skills. Data science in business is applied problem solving in the context of business problems. It allows course graduates to see problems from different angles given the ocean of available data therefore enhancing creative-problem solving strategies.
Furthermore, MBAs with fundamental data science skills can understand what types of data are actually useful for companies, what good data analytics looks like, where data analytics can specifically add value to businesses, and which business questions data science can and cannot answer.
MBAs can benefit from learning how to “speak tech” to be able to better communicate with technical teams and work collaboratively with them on achieving company goals. Having practical data science and programming experiences will allow business managers to have more productive and meaningful conversations with their company’s technical teams.
In conclusion, learning data science skills is a high ROI investment for individuals given the plethora of benefits received versus the low monetary and time costs of obtaining such skills.