top of page
  • Michael Parker

Exploring the Data Lake



In the last couple of years there is a lot of interest around Data Lake.  In schools, it sometimes comes up in discussions around SIS projects and data analytics. 

What is a Data Lake?

A Data Lake is the idea that all the systems within a school will dump data into a central storage area and that these systems can also consume data from this central storage area.

 

What are the advantages?

Because all your data is stored in one spot, it means that data from multiple systems can be analysed separately or combined, possibly improved and fed into another system.  You can think about the potential that AI could yield here in revealing some trends about your students for instance.  It is also easier to create your own integrations between systems as usually a 1/3 to ½ of that work usually involves pulling out the data.

 The nearly unachievable goal with a Data Lake model can be to link all of your systems together in real time and keeping everything up to date no matter where the changes are made. 

There is a perceived freedom because each application connects to the Data Lake and their function is interchangeable in theory.  Some people look at a Data Lake to simplify the migration of your SIS.  Once again, in theory this means that the data lake has all the information and that all this information can be fed into another SIS (and other systems) in real time. 

 

Challenges with Data Lake

If the promise of Data Lake sounds good, we have to acknowledge that there are some big challenges at least with some of the goals that people have.

Challenge 1:  Getting the data into the Data lake.

The first issue is how will you pull data out of your current systems automatically and get it into the data Lake?  Given that each System will work differently, you will need a variety of techniques. 

·       Sometimes you might use direct access to a data base, but more systems are moving to the cloud and cloud systems rarely allow direct access to the data base.

·       Sometimes you might pull data from an API, but API access is normally limited as to the data you can pull out.  (If you don’t know what an API is, I explained integrations and API’s in the last article https://www.parkerits.com.au/post/understanding-sis-connections.)

·       Depending on the system, you may be able to schedule an automatic export. 

·       Finally, you might need a bit of a hack to pull data out through a data export that is supported by the system for a user, but not through another way. 

In some cases, you will not be able to get all the data.  Some systems are more restrictive in getting access to their data, but in general, with a bit of work you should be able to get a lot of data into the data lake.

 

Challenge 2:  Getting the data into the systems

If pulling data out of the different systems and into the Data Lake sounds challenging it is relatively easy compared to writing data to the different systems in an automated fashion.  SIS’s heavily restrict with what data that you can write to them. Some general categories of information that we can often write to a SIS are:  Enrolments, Timetable, Markbook.  Other areas that we can sometimes write data are: Attendance, Finance, Accounts Payable, HR.  But those areas depend on each individual SIS.

If that list seems comprehensive, it’s not.  It may account for 33% of the total needs of the school.  This is significant, but it your goal is 100% you are going to be disappointed.

With some trickery and hacks you can bring that number higher to maybe 66%.  At this point progress is no longer low hanging fruit and mid hanging fruit.  The rest is clearly in the too hard basket for many.  It will require money, time and negotiation skills with multiple stakeholders.  SIS’s in general don’t like being written to.  They are the “source of truth” at least in their own heads.  The reality is there are multiple sources of truth for data, of which the SIS is often the foundation which everything builds on. 

You may be wondering why SIS don’t like people to write to their data base.  This put in a nutshell is the concern from SIS companies that customers writing data to their data base will mess things up and they will be blamed for the resulting chaos.  There is reason and justification for this belief, even if it doesn’t entirely make sense in the realm of API’s.

If we look past the SIS, other applications tend to be more open to you writing data into them.

Challenge 3: IT Security

The biggest concern related to Data Lake is that you have just placed all your data in one place.  If that Data Lake is compromised, you likely have the most serious data breach possible.  A data lake is an extra “attack vector” and even if it’s locked down, there is always a chance that your precautions fail.

Challenge 4: Complexity

With a Data Lake, and with attempting to accomplish bigger goals like deeper integrations you are adding complexity to your environment.  Instead of asking other companies to perform integrations, your school is doing that work. 

The other issue with complexity is if an integration exists between 2 products, a unified data model (this is the model that Data Lakes use) suggest that we should ignore the direct integration in favour of putting the data into a data lake and then pull the data out of the data lake.  This inherently adds to the complexity, difficulty and offers more things to break.

 

Is a Data Lake approach the right one? It depends on what your goal is.

If you want to combine and analyse all (or lots of) your data within a school, I think you could argue that a data lake might be the best solution.

If you are looking at reducing the dependency on a particular system, like your SIS.  I think it will help, but I’m not sure that the effort will justify the gains.

If you are looking at combining data in interesting ways and trying to achieve more in terms of integrations and can throw extra resources at the issue, a data lake might be a good solution, just keep the expectations reasonable.  You will be limited on what you can accomplish based on the limitations of the systems in which you are using.  This potentially is an expensive option, but it can also lead to achieving something new.

One additional area that I'm considering using a Data Lake for is for SIS migrations. I'm intending that this data will come in through standard templates that the vendor will provide and that it will be done a few times. This isn't my idea entirely, but one we will be experimenting with in the near future.

Good luck with your use of Data Lake, make sure that your goals are achievable and justifies the additional cost and risks of the technology.

35 views0 comments

Recent Posts

See All

Comentarios


Let's Work Together.

 

Parker IT was born of the idea that we could make a different in education providing IT services at an affordable price.

  • LinkedIn

© 2024 PARKER IT

IT Solutions & Consulting at its best.

bottom of page