I got into a (somewhat heated) discussion with my colleague today about what cha
ID: 654079 • Letter: I
Question
I got into a (somewhat heated) discussion with my colleague today about what characters our application should accept. This was prompted by the discovery that you can enter anything in the search box and the application will dutifully perform a search by that string. However this applies equally to all the textboxes in the application, not just the search box.
My colleague is of the opinion that the best practice (from a security viewpoint) is to limit the allowed characters to some letters, digits, and a subset of symbols. This prevents the user entering all kinds of unprintable Unicode control characters and whatnotelse.
I on the other hand am of the opinion that this will only annoy the users and not offer any additional security. I think that the best practice is to make your application accept anything, and then use the proper encoding functions (and parametrized queries if they are available) to make sure that the entered string passes through unmodified and is displayed/used as entered. If the user enters garbage, he will see garbage, but the system will work correctly.
What is the industry best practice here?
Explanation / Answer
There is a lot of good advice in the answers above, but I'm not sure they have addressed the main part of your question i.e. limiting the number/size of input data.
To recap what has been stated already
+ Use client side input validation and feedback to improve the client user experience, but DO NOT rely on anything client side for security purposes. All client side measures are easily defeated. The client side is for the client and should be client focused
+ Do all security checks, validation of data, sanitising data etc at the server side. Assume the data supplied is hostile and cannot be trusted. Use well tested known solutions where possible rather than re-inventing the wheel and try to give feedback to the user if you do not allow something so that they know what is happening and can possible re-structure their input
With respect to the question about restricting the amount of data and what data is allowed and what I feel is an inaccurate interpretation of the concept of being liberal with what you accept and conservative with what you do I would suggest
+ There is no point in accepting data you cannot use. What you can use will depend on the limitations of the components which make up your application (database, supported character encodings, maximum buffer limits etc)
+ There is no point accepting data too long to fit into whatever use you have for it i.e. accepting data fields longer than the field length of your database is pointless
+ Consider the performance hit associated with extremely long input data. For example, if it is a search string, is there a limit at which the performance or resources consumed by extremely long queries going to adversely impact your system or end up returning unusable results?
+ Is there a risk that unlimited input lengths could trigger buffer overflow vulnerabilities? Are ALL the components (libraries, external systems, databases etc) able to handle arbitrary lengths of input data or will they crash, do unexpected truncation etc
Being liberal in what you accept does not mean you have to use everything you accept. It really means don't just fail or crash. It means providing feedback to the user why you cannot handle the input and catching failures so that they are managed gracefully. There is no point in arguing that you should accept everything for a good user experience if you cannot use what is provided or handle it in a reliable manner. However, you should not silently drop characters or truncate input without informing the user of the reasons or limits. Users only get frustrated when it isn't clear what is and what is not acceptable - provide clear information so that their expectations match your capability and there will be far fewer frustrated users.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.